Drew Breunig divides AI uses cases into Gods, Interns, and Cogs:
- Gods: Super-intelligent, artificial entities that do things autonomously.
- Interns: Supervised copilots that collaborate with experts, focusing on grunt work.
- Cogs: Functions optimized to perform a single task extremely well, usually as part of a pipeline or interface.
Each category comes with different costs:
- Gods may cost billions or trillions of dollars to develop. The feasibility of building them is unclear, as they don’t exist yet.
- Interns are free or cheap to use. The cost of building them can range from negligible (e.g., creating a custom GPT via prompting) to the millions of dollars (e.g., Bloomberg spent millions building a specialised model on its finance data).
- Cogs are free or cheap to build and run, and they are reliable enough to run independently.
This categorisation is useful when considering startup and product ideas, and your use of AI:
- Don’t build AI Gods, unless you have access to ungodly amounts of capital. This may be obvious to most people, but I still see pitches for model-centric startups, and God-building model companies are the focus of much AI hype.
- Build with AI Interns, as they can significantly increase your productivity. Interns include writing assistants, meeting transcribers, and programming copilots. Ignoring the wealth of AI interns is foolish for individuals and companies alike.
- Don’t build complex AI Interns, unless you have a use case that justifies the costs and risks. For example, Bloomberg’s multi-million dollar model was outperformed by GPT-4 a few months after its release, and it’s unclear whether it ended up powering any features.
- Build with and build AI Cogs, but ensure you can manage non-deterministic components in production. This includes anything from traditional machine learning to the plethora of AI tasks that have become commodified in recent years like object recognition and text summarisation.
Above all, start by defining the problem and assessing the impact rather than making AI use your goal. Focusing on problems and solution impact is robust to hype cycles. I’ve considered this focus to be the hardest problem in data science since at least 2015. Amazon data scientist Dzidas Martinaitis has recently captured a similar sentiment in his flowchart for data science projects. Similarly, Douglas Gray and Evan Shellshear have found that data science and AI projects typically fail due to issues with strategy and process, rather than tech and people shortfalls.
Ignore at your own risk.
Acknowledgement: This post was produced with the help of AI interns. One version of Gemini produced the cover image, while another made helpful suggestions on earlier drafts.
Public comments are closed, but I love hearing from readers. Feel free to diff --git a/index.xml b/index.xml index 0b93cf31d..35a1c831f 100644 --- a/index.xml +++ b/index.xml @@ -1 +1 @@ -Yanir Seroussi | Data & AI for Startup Impact https://yanirseroussi.com/Recent content on Yanir Seroussi | Data & AI for Startup Impact Hugo -- 0.138.0 en-au Text and figures licensed under CC BY-NC-ND 4.0 by Yanir Seroussi, except where noted otherwise Mon, 18 Nov 2024 11:08:07 +1000 Don't build AI, build with AI https://yanirseroussi.com/2024/11/18/dont-build-ai-build-with-ai/Sun, 17 Nov 2024 01:00:00 +0000 https://yanirseroussi.com/2024/11/18/dont-build-ai-build-with-ai/ Building AI is hard and expensive. For most companies, the path to AI success is building with third-party AI interns and cheap AI cogs. In praise of inconsistency: Ditching weekly posts https://yanirseroussi.com/2024/09/23/in-praise-of-inconsistency-ditching-weekly-posts/Mon, 23 Sep 2024 06:00:00 +0000 https://yanirseroussi.com/2024/09/23/in-praise-of-inconsistency-ditching-weekly-posts/ On moving away from weekly blog posts in favour of deeper inconsistent articles and LinkedIn engagement. Data, AI, humans, and climate: Carving a consulting niche https://yanirseroussi.com/2024/09/09/data-ai-humans-and-climate-carving-a-consulting-niche/Mon, 09 Sep 2024 00:30:00 +0000 https://yanirseroussi.com/2024/09/09/data-ai-humans-and-climate-carving-a-consulting-niche/ Podcast chat on the reality of Data & AI and my consulting focus: Helping climate & nature tech startups ship data-intensive solutions. Juggling delivery, admin, and leads: Monthly biz recap https://yanirseroussi.com/2024/09/02/juggling-delivery-admin-and-leads-monthly-biz-recap/Mon, 02 Sep 2024 02:30:00 +0000 https://yanirseroussi.com/2024/09/02/juggling-delivery-admin-and-leads-monthly-biz-recap/ Highlights and lessons from my solo expertise biz, including value pricing, fractional cash flow, and distractions from admin & politics. AI hype, AI bullshit, and the real deal https://yanirseroussi.com/2024/08/26/ai-hype-ai-bullshit-and-the-real-deal/Mon, 26 Aug 2024 01:00:00 +0000 https://yanirseroussi.com/2024/08/26/ai-hype-ai-bullshit-and-the-real-deal/ My views on separating AI hype and bullshit from the real deal. The general ideas apply to past and future hype waves in tech. Giving up on the minimum viable data stack https://yanirseroussi.com/2024/08/19/giving-up-on-the-minimum-viable-data-stack/Mon, 19 Aug 2024 03:30:00 +0000 https://yanirseroussi.com/2024/08/19/giving-up-on-the-minimum-viable-data-stack/ Exploring why universal advice on startup data stacks is challenging, and the importance of context-specific decisions in data infrastructure. Keep learning: Your career is never truly done https://yanirseroussi.com/2024/08/12/keep-learning-your-career-is-never-truly-done/Mon, 12 Aug 2024 01:30:00 +0000 https://yanirseroussi.com/2024/08/12/keep-learning-your-career-is-never-truly-done/ Podcast chat on my career journey from software engineering to data science and independent consulting. First year lessons from a solo expertise biz in Data & AI https://yanirseroussi.com/2024/08/05/first-year-lessons-from-a-solo-expertise-biz-in-data-and-ai/Mon, 05 Aug 2024 08:45:00 +0000 https://yanirseroussi.com/2024/08/05/first-year-lessons-from-a-solo-expertise-biz-in-data-and-ai/ Reflections on building a solo expertise business in Data & AI, focusing on climate tech startups. Lessons learned from the first year of transition. AI/ML lifecycle models versus real-world mess https://yanirseroussi.com/2024/07/29/ai-ml-lifecycle-models-versus-real-world-mess/Mon, 29 Jul 2024 06:00:00 +0000 https://yanirseroussi.com/2024/07/29/ai-ml-lifecycle-models-versus-real-world-mess/ The real world of AI/ML doesn’t fit into a neat diagram, so I created another diagram and a maturity heatmap to model the mess. Your first Data-to-AI hire: Run a lovable process https://yanirseroussi.com/2024/07/22/your-first-data-to-ai-hire-run-a-lovable-process/Mon, 22 Jul 2024 01:00:00 +0000 https://yanirseroussi.com/2024/07/22/your-first-data-to-ai-hire-run-a-lovable-process/ Video and key points from the second part of a webinar on a startup’s first data hire, covering tips for defining the role and running the process. Learn about Dataland to avoid expensive hiring mistakes https://yanirseroussi.com/2024/07/15/learn-about-dataland-to-avoid-expensive-hiring-mistakes/Mon, 15 Jul 2024 05:30:00 +0000 https://yanirseroussi.com/2024/07/15/learn-about-dataland-to-avoid-expensive-hiring-mistakes/ Video and key points from the first part of a webinar on a startup’s first data hire, covering data & AI definitions and high-level recommendations. Exploring an AI product idea with the latest ChatGPT, Claude, and Gemini https://yanirseroussi.com/2024/07/08/exploring-an-ai-product-idea-with-the-latest-chatgpt-claude-and-gemini/Mon, 08 Jul 2024 02:45:00 +0000 https://yanirseroussi.com/2024/07/08/exploring-an-ai-product-idea-with-the-latest-chatgpt-claude-and-gemini/ Asking identical questions about my MagicGrantMaker idea yielded near-identical responses from the top chatbot models. Stay alert! Security is everyone's responsibility https://yanirseroussi.com/2024/07/01/stay-alert-security-is-everyones-responsibility/Mon, 01 Jul 2024 02:00:00 +0000 https://yanirseroussi.com/2024/07/01/stay-alert-security-is-everyones-responsibility/ Questions to assess the security posture of a startup, focusing on basic hygiene and handling of sensitive data. Five team-building mistakes, according to Patty McCord https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/Wed, 26 Jun 2024 00:00:00 +0000 https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/ Takeaways from an interview with Patty McCord on The Startup Podcast. Is your tech stack ready for data-intensive applications? https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/Mon, 24 Jun 2024 02:00:00 +0000 https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/ Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics. Dealing with endless data changes https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/Sat, 22 Jun 2024 22:50:00 +0000 https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/ Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data. AI ain't gonna save you from bad data https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Mon, 17 Jun 2024 02:00:00 +0000 https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/ Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects. The rules of the passion economy https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/Wed, 12 Jun 2024 02:50:00 +0000 https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/ Summary of the main messages from the book The Passion Economy by Adam Davidson. Startup data health starts with healthy event tracking https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Mon, 10 Jun 2024 04:00:00 +0000 https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/ Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events. How to avoid startups with poor development processes https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Mon, 03 Jun 2024 02:45:00 +0000 https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/ Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role. Plumbing, Decisions, and Automation: De-hyping Data & AI https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Mon, 27 May 2024 02:00:00 +0000 https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/ Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype). Adapting to the economy of algorithms https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/Sat, 25 May 2024 00:00:00 +0000 https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/ Overview of the book The Economy of Algorithms by Marek Kowalkiewicz. Question startup culture before accepting a data-to-AI role https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Mon, 20 May 2024 02:25:00 +0000 https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/ Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture. Probing the People aspects of an early-stage startup https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Mon, 13 May 2024 02:00:00 +0000 https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/ Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles. Business questions to ask before taking a startup data role https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Mon, 06 May 2024 04:30:00 +0000 https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/ Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles. Mentorship and the art of actionable advice https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Mon, 29 Apr 2024 06:30:00 +0000 https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/ Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships. Assessing a startup's data-to-AI health https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Mon, 22 Apr 2024 06:00:00 +0000 https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/ Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front. AI does not obviate the need for testing and observability https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/Mon, 15 Apr 2024 05:00:00 +0000 https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/ It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software. LinkedIn is a teachable skill https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/Thu, 11 Apr 2024 01:45:25 +0000 https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/ An high-level overview of things I learned from Justin Welsh’s LinkedIn Operating System course. My experience as a Data Tech Lead with Work on Climate https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/Mon, 08 Apr 2024 02:00:00 +0000 https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/ The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work. The data engineering lifecycle is not going anywhere https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/Fri, 05 Apr 2024 01:00:00 +0000 https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/ My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley. Artificial intelligence, automation, and the art of counting fish https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Mon, 01 Apr 2024 06:00:00 +0000 https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/ Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement. Atomic Habits is full of actionable advice https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/Tue, 12 Mar 2024 06:19:31 +0000 https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/ I put the book to use after the first listen, and will definitely revisit it in the future to form better habits. Questions to consider when using AI for PDF data extraction https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Mon, 11 Mar 2024 00:00:00 +0000 https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/ Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents. Two types of startup data problems https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Mon, 04 Mar 2024 02:00:00 +0000 https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/ Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face. Avoiding AI complexity: First, write no code https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Mon, 26 Feb 2024 01:45:00 +0000 https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/ Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach. Building your startup's minimum viable data stack https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/Mon, 19 Feb 2024 00:00:00 +0000 https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/ First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations. The three Cs of indie consulting: Confidence, Cash, and Connections https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/Sat, 17 Feb 2024 02:00:00 +0000 https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/ Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting. Nudging ChatGPT to invent books you have no time to read https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Mon, 12 Feb 2024 05:00:00 +0000 https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/ Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities. Future software development may require fewer humans https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/Tue, 06 Feb 2024 06:15:00 +0000 https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/ Reflecting on an interview with Jason Warner, CEO of poolside. Substance over titles: Your first data hire may be a data scientist https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Mon, 05 Feb 2024 02:45:00 +0000 https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/ Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people. New decade, new tagline: Data & AI for Impact https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Fri, 19 Jan 2024 00:00:00 +0000 https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/ Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement. Psychographic specialisations may work for discipline generalists https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/Tue, 09 Jan 2024 03:00:00 +0000 https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/ When focusing on a market segment defined by personal beliefs, it’s often fine to position yourself as a generalist in your craft. The power of parasocial relationships https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/Mon, 08 Jan 2024 06:00:00 +0000 https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/ Repeated exposure to media personas creates relationships that help justify premium fees. Positioning is a common problem for data scientists https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/Mon, 18 Dec 2023 00:30:00 +0000 https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/ With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark. Transfer learning applies to energy market bidding https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/Thu, 14 Dec 2023 00:15:00 +0000 https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/ An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland. Supporting volunteer monitoring of marine biodiversity with modern web and data tools https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Wed, 29 Nov 2023 02:00:00 +0000 https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/ Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app. Our Blue Machine is changing, but we are not helpless https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/Tue, 28 Nov 2023 06:40:00 +0000 https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/ One of my many highlights from Helen Czerski’s Blue Machine. You don't need a proprietary API for static maps https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/Tue, 21 Nov 2023 06:00:00 +0000 https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/ For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps. Lessons from reluctant data engineering https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Wed, 25 Oct 2023 04:45:00 +0000 https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/ Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had. Artificial intelligence was a marketing term all along – just call it automation https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/Fri, 06 Oct 2023 05:00:00 +0000 https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/ Replacing ‘artificial intelligence’ with ‘automation’ is a useful trick for cutting through the hype. The lines between solo consulting and product building are blurry https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/Mon, 25 Sep 2023 00:00:00 +0000 https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/ It turns out that problems like finding a niche and defining the ideal clients are key to any solo business. Google's Rules of Machine Learning still apply in the age of large language models https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/Thu, 21 Sep 2023 21:30:00 +0000 https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/ Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices. My rediscovery of quiet writing on the open web https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Mon, 28 Aug 2023 05:30:00 +0000 https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/ Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes. The Minimalist Entrepreneur is too prescriptive for me https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/Mon, 21 Aug 2023 03:15:00 +0000 https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/ While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder’s experience. Revisiting Start Small, Stay Small in 2023 (Chapter 2) https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/Thu, 17 Aug 2023 07:45:00 +0000 https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/ A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections. Revisiting Start Small, Stay Small in 2023 (Chapter 1) https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/Wed, 16 Aug 2023 05:45:00 +0000 https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/ A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections. Email notifications on public GitHub commits https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/Mon, 14 Aug 2023 05:15:00 +0000 https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/ GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits. The rule of thirds can probably be ignored https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/Fri, 11 Aug 2023 03:15:00 +0000 https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/ Turns out that the rule of thirds for composing visuals may not be that important. Using YubiKey for SSH access https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/Sun, 23 Jul 2023 00:07:15 +0000 https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/ Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04. Making a TIL section with Hugo and PaperMod https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/Mon, 17 Jul 2023 00:06:15 +0000 https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/ How I added a Today I Learned section to my Hugo site with the PaperMod theme. You can't save time https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/Tue, 11 Jul 2023 00:00:00 +0000 https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/ Time can be spent doing different activities, but it can’t be stored and saved for later. Was data science a failure mode of software engineering? https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Fri, 30 Jun 2023 00:06:30 +0000 https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/ Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles. How hackable are automated coding assessments? https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Fri, 26 May 2023 00:03:00 +0000 https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/ Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study. Remaining relevant as a small language model https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Fri, 21 Apr 2023 00:06:30 +0000 https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/ Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now). ChatGPT is transformative AI https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/Sun, 11 Dec 2022 00:00:00 +0000 https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/ My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning. Causal Machine Learning is off to a good start, despite some issues https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Mon, 12 Sep 2022 02:45:00 +0000 https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/ Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness. The mission matters: Moving to climate tech as a data scientist https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Mon, 06 Jun 2022 00:00:00 +0000 https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/ Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change. Building useful machine learning tools keeps getting easier: A fish ID case study https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Sun, 20 Mar 2022 04:30:00 +0000 https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/ Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments. Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Fri, 14 Jan 2022 00:05:40 +0000 https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/ Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments. Use your human brain to avoid artificial intelligence disasters https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Mon, 22 Nov 2021 03:45:00 +0000 https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/ Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI. Migrating from WordPress.com to Hugo on GitHub + Cloudflare https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/Wed, 10 Nov 2021 06:30:00 +0000 https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/ My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process. My work with Automattic https://yanirseroussi.com/2021/10/07/my-work-with-automattic/Thu, 07 Oct 2021 00:00:00 +0000 https://yanirseroussi.com/2021/10/07/my-work-with-automattic/ Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company. Some highlights from 2020 https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Mon, 05 Apr 2021 06:41:48 +0000 https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/ Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform. Many is not enough: Counting simulations to bootstrap the right way https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Mon, 24 Aug 2020 01:35:17 +0000 https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/ Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals. Software commodities are eating interesting data science work https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Sat, 11 Jan 2020 09:22:35 +0000 https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/ Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant? A day in the life of a remote data scientist https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Wed, 11 Dec 2019 22:06:19 +0000 https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/ Video of a talk I gave on remote data science work at the Data Science Sydney meetup. Bootstrapping the right way? https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Sun, 06 Oct 2019 06:48:07 +0000 https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/ Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals. Hackers beware: Bootstrap sampling may be harmful https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Mon, 07 Jan 2019 21:07:56 +0000 https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/ Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple. The most practical causal inference book I’ve read (is still a draft) https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Mon, 24 Dec 2018 02:37:50 +0000 https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/ Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area. Reflections on remote data science work https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Sat, 03 Nov 2018 06:33:13 +0000 https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/ Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist. Defining data science in 2018 https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Sun, 22 Jul 2018 08:27:43 +0000 https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/ Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions. Advice for aspiring data scientists and other FAQs https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Sun, 15 Oct 2017 09:15:25 +0000 https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/ Frequently asked questions by visitors to this site, especially around entering the data science field. State of Bandcamp Recommender, Late 2017 https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Sat, 02 Sep 2017 10:19:02 +0000 https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/ Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations. My 10-step path to becoming a remote data scientist with Automattic https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/Sat, 29 Jul 2017 05:39:26 +0000 https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/ I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually. Exploring and visualising Reef Life Survey data https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Sat, 03 Jun 2017 00:49:05 +0000 https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/ Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work. Customer lifetime value and the proliferation of misinformation on the internet https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/Sun, 08 Jan 2017 20:02:30 +0000 https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/ There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well. Ask Why! Finding motives, causes, and purpose in data science https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Mon, 19 Sep 2016 21:28:44 +0000 https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/ Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling. If you don’t pay attention, data can drive you off a cliff https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Sun, 21 Aug 2016 21:34:17 +0000 https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/ Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities. Is Data Scientist a useless job title? https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/Thu, 04 Aug 2016 22:26:03 +0000 https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/ It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though. Making Bayesian A/B testing more accessible https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/Sun, 19 Jun 2016 10:32:15 +0000 https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/ A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules. Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Sat, 14 May 2016 19:57:03 +0000 https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/ Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time. The rise of greedy robots https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Sun, 20 Mar 2016 20:33:43 +0000 https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/ Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives. Why you should stop worrying about deep learning and deepen your understanding of causality instead https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Sun, 14 Feb 2016 11:04:11 +0000 https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/ Causality is often overlooked but is of much higher relevance to most data scientists than deep learning. The joys of offline data collection https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Sun, 24 Jan 2016 00:32:25 +0000 https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/ Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey. This holiday season, give me real insights https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Tue, 08 Dec 2015 06:57:25 +0000 https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/ Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights. The hardest parts of data science https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Mon, 23 Nov 2015 04:14:21 +0000 https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/ Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data. Migrating a simple web application from MongoDB to Elasticsearch https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Wed, 04 Nov 2015 03:53:18 +0000 https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/ Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits. Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Mon, 19 Oct 2015 00:02:32 +0000 https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/ Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work. The wonderful world of recommender systems https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Fri, 02 Oct 2015 05:25:57 +0000 https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/ Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems. You don’t need a data scientist (yet) https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Mon, 24 Aug 2015 08:25:30 +0000 https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/ Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist. Goodbye, Parse.com https://yanirseroussi.com/2015/07/31/goodbye-parse-com/Fri, 31 Jul 2015 03:29:50 +0000 https://yanirseroussi.com/2015/07/31/goodbye-parse-com/ Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution. Learning about deep learning through album cover classification https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Mon, 06 Jul 2015 22:21:42 +0000 https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/ Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning. Deep learning resources https://yanirseroussi.com/deep-learning-resources/Mon, 06 Jul 2015 00:38:44 +0000 https://yanirseroussi.com/deep-learning-resources/ Useful posts and papers on the topic of deep learning (circa 2015). Hopping on the deep learning bandwagon https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/Sat, 06 Jun 2015 05:00:22 +0000 https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/ To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning. First steps in data science: author-aware sentiment analysis https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/Sat, 02 May 2015 08:31:10 +0000 https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/ I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program. My divestment from fossil fuels https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/Fri, 24 Apr 2015 00:19:36 +0000 https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/ Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons. My PhD work https://yanirseroussi.com/phd-work/Mon, 30 Mar 2015 03:23:33 +0000 https://yanirseroussi.com/phd-work/ An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models. The long road to a lifestyle business https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/Sun, 22 Mar 2015 09:43:47 +0000 https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/ Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects. Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/Wed, 11 Feb 2015 06:34:17 +0000 https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/ My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams). Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/Thu, 29 Jan 2015 10:37:39 +0000 https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/ Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams). Automating Parse.com bulk data imports https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/Thu, 15 Jan 2015 04:41:16 +0000 https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/ A script for importing data into the Parse backend-as-a-service. Stochastic Gradient Boosting: Choosing the Best Number of Iterations https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/Mon, 29 Dec 2014 02:30:06 +0000 https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/ Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn. SEO: Mostly about showing up? https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/Mon, 15 Dec 2014 04:25:25 +0000 https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/ Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content. Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary) https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/Wed, 19 Nov 2014 09:17:34 +0000 https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/ Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams. BCRecommender Traction Update https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/Wed, 05 Nov 2014 02:29:35 +0000 https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/ Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing. What is data science? https://yanirseroussi.com/2014/10/23/what-is-data-science/Thu, 23 Oct 2014 03:22:08 +0000 https://yanirseroussi.com/2014/10/23/what-is-data-science/ Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition. Greek Media Monitoring Kaggle competition: My approach https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/Tue, 07 Oct 2014 03:21:35 +0000 https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/ Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams. Applying the Traction Book’s Bullseye framework to BCRecommender https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/Wed, 24 Sep 2014 04:57:39 +0000 https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/ Ranking 19 channels with the goal of getting traction for BCRecommender. Bandcamp recommendation and discovery algorithms https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/Fri, 19 Sep 2014 14:26:55 +0000 https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/ The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery. Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/Sun, 07 Sep 2014 10:48:44 +0000 https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/ Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service. Building a Bandcamp recommender system (part 1 – motivation) https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/Sat, 30 Aug 2014 08:11:38 +0000 https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/ My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music. How to (almost) win Kaggle competitions https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/Sun, 24 Aug 2014 12:40:53 +0000 https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/ Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions. Data’s hierarchy of needs https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/Sun, 17 Aug 2014 13:09:30 +0000 https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/ Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data. Kaggle competition tips and summaries https://yanirseroussi.com/kaggle/Sat, 05 Apr 2014 23:46:10 +0000 https://yanirseroussi.com/kaggle/ Pointers to all my Kaggle advice posts and competition summaries. Kaggle beginner tips https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/Sun, 19 Jan 2014 10:34:28 +0000 https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/ First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions. About Yanir: Startup Data & AI Consultant https://yanirseroussi.com/about/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/about/ About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions. Book a free fifteen-minute call https://yanirseroussi.com/free-intro-call/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/free-intro-call/ Booking form for a quick intro call with Yanir Seroussi. Causal inference resources https://yanirseroussi.com/causal-inference-resources/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/causal-inference-resources/ Useful books, articles, and courses on the topic of causal inference. Free Guide: Data-to-AI Health Check for Startups https://yanirseroussi.com/data-to-ai-health-check/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/data-to-ai-health-check/ Download a free PDF guide that helps you assess a startup’s Data-to-AI health by probing eight key areas. Helping climate & nature tech startups ship data-intensive solutions https://yanirseroussi.com/consult/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/consult/ Consulting for climate & nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader. Speaking engagements by Yanir: Startup Data & AI Consultant https://yanirseroussi.com/talks/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/talks/ Yanir Seroussi speaks on data science, artificial intelligence, machine learning, and career journey. Stay in touch https://yanirseroussi.com/contact/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/contact/ Contact me or subscribe to the mailing list.
\ No newline at end of file
+Yanir Seroussi | Data & AI for Startup Impact https://yanirseroussi.com/Recent content on Yanir Seroussi | Data & AI for Startup Impact Hugo -- 0.138.0 en-au Text and figures licensed under CC BY-NC-ND 4.0 by Yanir Seroussi, except where noted otherwise Mon, 18 Nov 2024 11:12:27 +1000 Don't build AI, build with AI https://yanirseroussi.com/2024/11/18/dont-build-ai-build-with-ai/Mon, 18 Nov 2024 01:00:00 +0000 https://yanirseroussi.com/2024/11/18/dont-build-ai-build-with-ai/ Building AI is hard and expensive. For most companies, the path to AI success is building with third-party AI interns and cheap AI cogs. In praise of inconsistency: Ditching weekly posts https://yanirseroussi.com/2024/09/23/in-praise-of-inconsistency-ditching-weekly-posts/Mon, 23 Sep 2024 06:00:00 +0000 https://yanirseroussi.com/2024/09/23/in-praise-of-inconsistency-ditching-weekly-posts/ On moving away from weekly blog posts in favour of deeper inconsistent articles and LinkedIn engagement. Data, AI, humans, and climate: Carving a consulting niche https://yanirseroussi.com/2024/09/09/data-ai-humans-and-climate-carving-a-consulting-niche/Mon, 09 Sep 2024 00:30:00 +0000 https://yanirseroussi.com/2024/09/09/data-ai-humans-and-climate-carving-a-consulting-niche/ Podcast chat on the reality of Data & AI and my consulting focus: Helping climate & nature tech startups ship data-intensive solutions. Juggling delivery, admin, and leads: Monthly biz recap https://yanirseroussi.com/2024/09/02/juggling-delivery-admin-and-leads-monthly-biz-recap/Mon, 02 Sep 2024 02:30:00 +0000 https://yanirseroussi.com/2024/09/02/juggling-delivery-admin-and-leads-monthly-biz-recap/ Highlights and lessons from my solo expertise biz, including value pricing, fractional cash flow, and distractions from admin & politics. AI hype, AI bullshit, and the real deal https://yanirseroussi.com/2024/08/26/ai-hype-ai-bullshit-and-the-real-deal/Mon, 26 Aug 2024 01:00:00 +0000 https://yanirseroussi.com/2024/08/26/ai-hype-ai-bullshit-and-the-real-deal/ My views on separating AI hype and bullshit from the real deal. The general ideas apply to past and future hype waves in tech. Giving up on the minimum viable data stack https://yanirseroussi.com/2024/08/19/giving-up-on-the-minimum-viable-data-stack/Mon, 19 Aug 2024 03:30:00 +0000 https://yanirseroussi.com/2024/08/19/giving-up-on-the-minimum-viable-data-stack/ Exploring why universal advice on startup data stacks is challenging, and the importance of context-specific decisions in data infrastructure. Keep learning: Your career is never truly done https://yanirseroussi.com/2024/08/12/keep-learning-your-career-is-never-truly-done/Mon, 12 Aug 2024 01:30:00 +0000 https://yanirseroussi.com/2024/08/12/keep-learning-your-career-is-never-truly-done/ Podcast chat on my career journey from software engineering to data science and independent consulting. First year lessons from a solo expertise biz in Data & AI https://yanirseroussi.com/2024/08/05/first-year-lessons-from-a-solo-expertise-biz-in-data-and-ai/Mon, 05 Aug 2024 08:45:00 +0000 https://yanirseroussi.com/2024/08/05/first-year-lessons-from-a-solo-expertise-biz-in-data-and-ai/ Reflections on building a solo expertise business in Data & AI, focusing on climate tech startups. Lessons learned from the first year of transition. AI/ML lifecycle models versus real-world mess https://yanirseroussi.com/2024/07/29/ai-ml-lifecycle-models-versus-real-world-mess/Mon, 29 Jul 2024 06:00:00 +0000 https://yanirseroussi.com/2024/07/29/ai-ml-lifecycle-models-versus-real-world-mess/ The real world of AI/ML doesn’t fit into a neat diagram, so I created another diagram and a maturity heatmap to model the mess. Your first Data-to-AI hire: Run a lovable process https://yanirseroussi.com/2024/07/22/your-first-data-to-ai-hire-run-a-lovable-process/Mon, 22 Jul 2024 01:00:00 +0000 https://yanirseroussi.com/2024/07/22/your-first-data-to-ai-hire-run-a-lovable-process/ Video and key points from the second part of a webinar on a startup’s first data hire, covering tips for defining the role and running the process. Learn about Dataland to avoid expensive hiring mistakes https://yanirseroussi.com/2024/07/15/learn-about-dataland-to-avoid-expensive-hiring-mistakes/Mon, 15 Jul 2024 05:30:00 +0000 https://yanirseroussi.com/2024/07/15/learn-about-dataland-to-avoid-expensive-hiring-mistakes/ Video and key points from the first part of a webinar on a startup’s first data hire, covering data & AI definitions and high-level recommendations. Exploring an AI product idea with the latest ChatGPT, Claude, and Gemini https://yanirseroussi.com/2024/07/08/exploring-an-ai-product-idea-with-the-latest-chatgpt-claude-and-gemini/Mon, 08 Jul 2024 02:45:00 +0000 https://yanirseroussi.com/2024/07/08/exploring-an-ai-product-idea-with-the-latest-chatgpt-claude-and-gemini/ Asking identical questions about my MagicGrantMaker idea yielded near-identical responses from the top chatbot models. Stay alert! Security is everyone's responsibility https://yanirseroussi.com/2024/07/01/stay-alert-security-is-everyones-responsibility/Mon, 01 Jul 2024 02:00:00 +0000 https://yanirseroussi.com/2024/07/01/stay-alert-security-is-everyones-responsibility/ Questions to assess the security posture of a startup, focusing on basic hygiene and handling of sensitive data. Five team-building mistakes, according to Patty McCord https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/Wed, 26 Jun 2024 00:00:00 +0000 https://yanirseroussi.com/til/2024/06/26/five-team-building-mistakes-according-to-patty-mccord/ Takeaways from an interview with Patty McCord on The Startup Podcast. Is your tech stack ready for data-intensive applications? https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/Mon, 24 Jun 2024 02:00:00 +0000 https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/ Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics. Dealing with endless data changes https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/Sat, 22 Jun 2024 22:50:00 +0000 https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/ Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data. AI ain't gonna save you from bad data https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Mon, 17 Jun 2024 02:00:00 +0000 https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/ Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects. The rules of the passion economy https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/Wed, 12 Jun 2024 02:50:00 +0000 https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/ Summary of the main messages from the book The Passion Economy by Adam Davidson. Startup data health starts with healthy event tracking https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Mon, 10 Jun 2024 04:00:00 +0000 https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/ Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events. How to avoid startups with poor development processes https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Mon, 03 Jun 2024 02:45:00 +0000 https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/ Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role. Plumbing, Decisions, and Automation: De-hyping Data & AI https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Mon, 27 May 2024 02:00:00 +0000 https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/ Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype). Adapting to the economy of algorithms https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/Sat, 25 May 2024 00:00:00 +0000 https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/ Overview of the book The Economy of Algorithms by Marek Kowalkiewicz. Question startup culture before accepting a data-to-AI role https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Mon, 20 May 2024 02:25:00 +0000 https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/ Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture. Probing the People aspects of an early-stage startup https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Mon, 13 May 2024 02:00:00 +0000 https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/ Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles. Business questions to ask before taking a startup data role https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Mon, 06 May 2024 04:30:00 +0000 https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/ Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles. Mentorship and the art of actionable advice https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Mon, 29 Apr 2024 06:30:00 +0000 https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/ Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships. Assessing a startup's data-to-AI health https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Mon, 22 Apr 2024 06:00:00 +0000 https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/ Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front. AI does not obviate the need for testing and observability https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/Mon, 15 Apr 2024 05:00:00 +0000 https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/ It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software. LinkedIn is a teachable skill https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/Thu, 11 Apr 2024 01:45:25 +0000 https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/ An high-level overview of things I learned from Justin Welsh’s LinkedIn Operating System course. My experience as a Data Tech Lead with Work on Climate https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/Mon, 08 Apr 2024 02:00:00 +0000 https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/ The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work. The data engineering lifecycle is not going anywhere https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/Fri, 05 Apr 2024 01:00:00 +0000 https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/ My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley. Artificial intelligence, automation, and the art of counting fish https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Mon, 01 Apr 2024 06:00:00 +0000 https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/ Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement. Atomic Habits is full of actionable advice https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/Tue, 12 Mar 2024 06:19:31 +0000 https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/ I put the book to use after the first listen, and will definitely revisit it in the future to form better habits. Questions to consider when using AI for PDF data extraction https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Mon, 11 Mar 2024 00:00:00 +0000 https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/ Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents. Two types of startup data problems https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Mon, 04 Mar 2024 02:00:00 +0000 https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/ Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face. Avoiding AI complexity: First, write no code https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Mon, 26 Feb 2024 01:45:00 +0000 https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/ Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach. Building your startup's minimum viable data stack https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/Mon, 19 Feb 2024 00:00:00 +0000 https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/ First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations. The three Cs of indie consulting: Confidence, Cash, and Connections https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/Sat, 17 Feb 2024 02:00:00 +0000 https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/ Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting. Nudging ChatGPT to invent books you have no time to read https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Mon, 12 Feb 2024 05:00:00 +0000 https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/ Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities. Future software development may require fewer humans https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/Tue, 06 Feb 2024 06:15:00 +0000 https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/ Reflecting on an interview with Jason Warner, CEO of poolside. Substance over titles: Your first data hire may be a data scientist https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Mon, 05 Feb 2024 02:45:00 +0000 https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/ Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people. New decade, new tagline: Data & AI for Impact https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Fri, 19 Jan 2024 00:00:00 +0000 https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/ Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement. Psychographic specialisations may work for discipline generalists https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/Tue, 09 Jan 2024 03:00:00 +0000 https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/ When focusing on a market segment defined by personal beliefs, it’s often fine to position yourself as a generalist in your craft. The power of parasocial relationships https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/Mon, 08 Jan 2024 06:00:00 +0000 https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/ Repeated exposure to media personas creates relationships that help justify premium fees. Positioning is a common problem for data scientists https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/Mon, 18 Dec 2023 00:30:00 +0000 https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/ With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark. Transfer learning applies to energy market bidding https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/Thu, 14 Dec 2023 00:15:00 +0000 https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/ An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland. Supporting volunteer monitoring of marine biodiversity with modern web and data tools https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Wed, 29 Nov 2023 02:00:00 +0000 https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/ Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app. Our Blue Machine is changing, but we are not helpless https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/Tue, 28 Nov 2023 06:40:00 +0000 https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/ One of my many highlights from Helen Czerski’s Blue Machine. You don't need a proprietary API for static maps https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/Tue, 21 Nov 2023 06:00:00 +0000 https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/ For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps. Lessons from reluctant data engineering https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Wed, 25 Oct 2023 04:45:00 +0000 https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/ Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had. Artificial intelligence was a marketing term all along – just call it automation https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/Fri, 06 Oct 2023 05:00:00 +0000 https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/ Replacing ‘artificial intelligence’ with ‘automation’ is a useful trick for cutting through the hype. The lines between solo consulting and product building are blurry https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/Mon, 25 Sep 2023 00:00:00 +0000 https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/ It turns out that problems like finding a niche and defining the ideal clients are key to any solo business. Google's Rules of Machine Learning still apply in the age of large language models https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/Thu, 21 Sep 2023 21:30:00 +0000 https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/ Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices. My rediscovery of quiet writing on the open web https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Mon, 28 Aug 2023 05:30:00 +0000 https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/ Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes. The Minimalist Entrepreneur is too prescriptive for me https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/Mon, 21 Aug 2023 03:15:00 +0000 https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/ While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder’s experience. Revisiting Start Small, Stay Small in 2023 (Chapter 2) https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/Thu, 17 Aug 2023 07:45:00 +0000 https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/ A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections. Revisiting Start Small, Stay Small in 2023 (Chapter 1) https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/Wed, 16 Aug 2023 05:45:00 +0000 https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/ A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections. Email notifications on public GitHub commits https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/Mon, 14 Aug 2023 05:15:00 +0000 https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/ GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits. The rule of thirds can probably be ignored https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/Fri, 11 Aug 2023 03:15:00 +0000 https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/ Turns out that the rule of thirds for composing visuals may not be that important. Using YubiKey for SSH access https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/Sun, 23 Jul 2023 00:07:15 +0000 https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/ Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04. Making a TIL section with Hugo and PaperMod https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/Mon, 17 Jul 2023 00:06:15 +0000 https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/ How I added a Today I Learned section to my Hugo site with the PaperMod theme. You can't save time https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/Tue, 11 Jul 2023 00:00:00 +0000 https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/ Time can be spent doing different activities, but it can’t be stored and saved for later. Was data science a failure mode of software engineering? https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Fri, 30 Jun 2023 00:06:30 +0000 https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/ Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles. How hackable are automated coding assessments? https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Fri, 26 May 2023 00:03:00 +0000 https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/ Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study. Remaining relevant as a small language model https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Fri, 21 Apr 2023 00:06:30 +0000 https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/ Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now). ChatGPT is transformative AI https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/Sun, 11 Dec 2022 00:00:00 +0000 https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/ My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning. Causal Machine Learning is off to a good start, despite some issues https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Mon, 12 Sep 2022 02:45:00 +0000 https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/ Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness. The mission matters: Moving to climate tech as a data scientist https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Mon, 06 Jun 2022 00:00:00 +0000 https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/ Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change. Building useful machine learning tools keeps getting easier: A fish ID case study https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Sun, 20 Mar 2022 04:30:00 +0000 https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/ Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments. Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Fri, 14 Jan 2022 00:05:40 +0000 https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/ Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments. Use your human brain to avoid artificial intelligence disasters https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Mon, 22 Nov 2021 03:45:00 +0000 https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/ Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI. Migrating from WordPress.com to Hugo on GitHub + Cloudflare https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/Wed, 10 Nov 2021 06:30:00 +0000 https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/ My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process. My work with Automattic https://yanirseroussi.com/2021/10/07/my-work-with-automattic/Thu, 07 Oct 2021 00:00:00 +0000 https://yanirseroussi.com/2021/10/07/my-work-with-automattic/ Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company. Some highlights from 2020 https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Mon, 05 Apr 2021 06:41:48 +0000 https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/ Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform. Many is not enough: Counting simulations to bootstrap the right way https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Mon, 24 Aug 2020 01:35:17 +0000 https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/ Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals. Software commodities are eating interesting data science work https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Sat, 11 Jan 2020 09:22:35 +0000 https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/ Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant? A day in the life of a remote data scientist https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Wed, 11 Dec 2019 22:06:19 +0000 https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/ Video of a talk I gave on remote data science work at the Data Science Sydney meetup. Bootstrapping the right way? https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Sun, 06 Oct 2019 06:48:07 +0000 https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/ Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals. Hackers beware: Bootstrap sampling may be harmful https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Mon, 07 Jan 2019 21:07:56 +0000 https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/ Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple. The most practical causal inference book I’ve read (is still a draft) https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Mon, 24 Dec 2018 02:37:50 +0000 https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/ Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area. Reflections on remote data science work https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Sat, 03 Nov 2018 06:33:13 +0000 https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/ Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist. Defining data science in 2018 https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Sun, 22 Jul 2018 08:27:43 +0000 https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/ Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions. Advice for aspiring data scientists and other FAQs https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Sun, 15 Oct 2017 09:15:25 +0000 https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/ Frequently asked questions by visitors to this site, especially around entering the data science field. State of Bandcamp Recommender, Late 2017 https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Sat, 02 Sep 2017 10:19:02 +0000 https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/ Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations. My 10-step path to becoming a remote data scientist with Automattic https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/Sat, 29 Jul 2017 05:39:26 +0000 https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/ I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually. Exploring and visualising Reef Life Survey data https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Sat, 03 Jun 2017 00:49:05 +0000 https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/ Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work. Customer lifetime value and the proliferation of misinformation on the internet https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/Sun, 08 Jan 2017 20:02:30 +0000 https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/ There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well. Ask Why! Finding motives, causes, and purpose in data science https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Mon, 19 Sep 2016 21:28:44 +0000 https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/ Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling. If you don’t pay attention, data can drive you off a cliff https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Sun, 21 Aug 2016 21:34:17 +0000 https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/ Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities. Is Data Scientist a useless job title? https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/Thu, 04 Aug 2016 22:26:03 +0000 https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/ It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though. Making Bayesian A/B testing more accessible https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/Sun, 19 Jun 2016 10:32:15 +0000 https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/ A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules. Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Sat, 14 May 2016 19:57:03 +0000 https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/ Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time. The rise of greedy robots https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Sun, 20 Mar 2016 20:33:43 +0000 https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/ Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives. Why you should stop worrying about deep learning and deepen your understanding of causality instead https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Sun, 14 Feb 2016 11:04:11 +0000 https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/ Causality is often overlooked but is of much higher relevance to most data scientists than deep learning. The joys of offline data collection https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Sun, 24 Jan 2016 00:32:25 +0000 https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/ Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey. This holiday season, give me real insights https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Tue, 08 Dec 2015 06:57:25 +0000 https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/ Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights. The hardest parts of data science https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Mon, 23 Nov 2015 04:14:21 +0000 https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/ Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data. Migrating a simple web application from MongoDB to Elasticsearch https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Wed, 04 Nov 2015 03:53:18 +0000 https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/ Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits. Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Mon, 19 Oct 2015 00:02:32 +0000 https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/ Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work. The wonderful world of recommender systems https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Fri, 02 Oct 2015 05:25:57 +0000 https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/ Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems. You don’t need a data scientist (yet) https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Mon, 24 Aug 2015 08:25:30 +0000 https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/ Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist. Goodbye, Parse.com https://yanirseroussi.com/2015/07/31/goodbye-parse-com/Fri, 31 Jul 2015 03:29:50 +0000 https://yanirseroussi.com/2015/07/31/goodbye-parse-com/ Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution. Learning about deep learning through album cover classification https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Mon, 06 Jul 2015 22:21:42 +0000 https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/ Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning. Deep learning resources https://yanirseroussi.com/deep-learning-resources/Mon, 06 Jul 2015 00:38:44 +0000 https://yanirseroussi.com/deep-learning-resources/ Useful posts and papers on the topic of deep learning (circa 2015). Hopping on the deep learning bandwagon https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/Sat, 06 Jun 2015 05:00:22 +0000 https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/ To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning. First steps in data science: author-aware sentiment analysis https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/Sat, 02 May 2015 08:31:10 +0000 https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/ I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program. My divestment from fossil fuels https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/Fri, 24 Apr 2015 00:19:36 +0000 https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/ Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons. My PhD work https://yanirseroussi.com/phd-work/Mon, 30 Mar 2015 03:23:33 +0000 https://yanirseroussi.com/phd-work/ An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models. The long road to a lifestyle business https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/Sun, 22 Mar 2015 09:43:47 +0000 https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/ Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects. Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2) https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/Wed, 11 Feb 2015 06:34:17 +0000 https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/ My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams). Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1) https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/Thu, 29 Jan 2015 10:37:39 +0000 https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/ Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams). Automating Parse.com bulk data imports https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/Thu, 15 Jan 2015 04:41:16 +0000 https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/ A script for importing data into the Parse backend-as-a-service. Stochastic Gradient Boosting: Choosing the Best Number of Iterations https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/Mon, 29 Dec 2014 02:30:06 +0000 https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/ Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn. SEO: Mostly about showing up? https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/Mon, 15 Dec 2014 04:25:25 +0000 https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/ Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content. Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary) https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/Wed, 19 Nov 2014 09:17:34 +0000 https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/ Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams. BCRecommender Traction Update https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/Wed, 05 Nov 2014 02:29:35 +0000 https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/ Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing. What is data science? https://yanirseroussi.com/2014/10/23/what-is-data-science/Thu, 23 Oct 2014 03:22:08 +0000 https://yanirseroussi.com/2014/10/23/what-is-data-science/ Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition. Greek Media Monitoring Kaggle competition: My approach https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/Tue, 07 Oct 2014 03:21:35 +0000 https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/ Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams. Applying the Traction Book’s Bullseye framework to BCRecommender https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/Wed, 24 Sep 2014 04:57:39 +0000 https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/ Ranking 19 channels with the goal of getting traction for BCRecommender. Bandcamp recommendation and discovery algorithms https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/Fri, 19 Sep 2014 14:26:55 +0000 https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/ The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery. Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout) https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/Sun, 07 Sep 2014 10:48:44 +0000 https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/ Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service. Building a Bandcamp recommender system (part 1 – motivation) https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/Sat, 30 Aug 2014 08:11:38 +0000 https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/ My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music. How to (almost) win Kaggle competitions https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/Sun, 24 Aug 2014 12:40:53 +0000 https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/ Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions. Data’s hierarchy of needs https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/Sun, 17 Aug 2014 13:09:30 +0000 https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/ Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data. Kaggle competition tips and summaries https://yanirseroussi.com/kaggle/Sat, 05 Apr 2014 23:46:10 +0000 https://yanirseroussi.com/kaggle/ Pointers to all my Kaggle advice posts and competition summaries. Kaggle beginner tips https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/Sun, 19 Jan 2014 10:34:28 +0000 https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/ First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions. About Yanir: Startup Data & AI Consultant https://yanirseroussi.com/about/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/about/ About Yanir Seroussi, a hands-on data tech lead with over a decade of experience. Yanir helps climate/nature tech startups ship data-intensive solutions. Book a free fifteen-minute call https://yanirseroussi.com/free-intro-call/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/free-intro-call/ Booking form for a quick intro call with Yanir Seroussi. Causal inference resources https://yanirseroussi.com/causal-inference-resources/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/causal-inference-resources/ Useful books, articles, and courses on the topic of causal inference. Free Guide: Data-to-AI Health Check for Startups https://yanirseroussi.com/data-to-ai-health-check/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/data-to-ai-health-check/ Download a free PDF guide that helps you assess a startup’s Data-to-AI health by probing eight key areas. Helping climate & nature tech startups ship data-intensive solutions https://yanirseroussi.com/consult/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/consult/ Consulting for climate & nature tech startups: Strategic advice, implementation of Data/AI/ML solutions, and hiring help by an experienced tech leader. Speaking engagements by Yanir: Startup Data & AI Consultant https://yanirseroussi.com/talks/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/talks/ Yanir Seroussi speaks on data science, artificial intelligence, machine learning, and career journey. Stay in touch https://yanirseroussi.com/contact/Mon, 01 Jan 0001 00:00:00 +0000 https://yanirseroussi.com/contact/ Contact me or subscribe to the mailing list.
\ No newline at end of file
diff --git a/posts/index.html b/posts/index.html
index 9edbf4865..5f6b488a9 100644
--- a/posts/index.html
+++ b/posts/index.html
@@ -8,7 +8,7 @@
">
Browse Posts
Don't build AI, build with AI
Building AI is hard and expensive. For most companies, the path to AI success is building with third-party AI interns and cheap AI cogs.
In praise of inconsistency: Ditching weekly posts
On moving away from weekly blog posts in favour of deeper inconsistent articles and LinkedIn engagement.
Data, AI, humans, and climate: Carving a consulting niche
Podcast chat on the reality of Data & AI and my consulting focus: Helping climate & nature tech startups ship data-intensive solutions.
Juggling delivery, admin, and leads: Monthly biz recap
Highlights and lessons from my solo expertise biz, including value pricing, fractional cash flow, and distractions from admin & politics.
AI hype, AI bullshit, and the real deal
My views on separating AI hype and bullshit from the real deal. The general ideas apply to past and future hype waves in tech.
Giving up on the minimum viable data stack
Exploring why universal advice on startup data stacks is challenging, and the importance of context-specific decisions in data infrastructure.
Keep learning: Your career is never truly done
Podcast chat on my career journey from software engineering to data science and independent consulting.
First year lessons from a solo expertise biz in Data & AI
Reflections on building a solo expertise business in Data & AI, focusing on climate tech startups. Lessons learned from the first year of transition.
AI/ML lifecycle models versus real-world mess
The real world of AI/ML doesn’t fit into a neat diagram, so I created another diagram and a maturity heatmap to model the mess.
Your first Data-to-AI hire: Run a lovable process
Video and key points from the second part of a webinar on a startup’s first data hire, covering tips for defining the role and running the process.
Learn about Dataland to avoid expensive hiring mistakes
Video and key points from the first part of a webinar on a startup’s first data hire, covering data & AI definitions and high-level recommendations.
Exploring an AI product idea with the latest ChatGPT, Claude, and Gemini
Asking identical questions about my MagicGrantMaker idea yielded near-identical responses from the top chatbot models.
Stay alert! Security is everyone's responsibility
Questions to assess the security posture of a startup, focusing on basic hygiene and handling of sensitive data.
Is your tech stack ready for data-intensive applications?
Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.
AI ain't gonna save you from bad data
Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.
Startup data health starts with healthy event tracking
Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.
How to avoid startups with poor development processes
Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.
Plumbing, Decisions, and Automation: De-hyping Data & AI
Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).
Question startup culture before accepting a data-to-AI role
Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.
Probing the People aspects of an early-stage startup
Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.
Business questions to ask before taking a startup data role
Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.
Mentorship and the art of actionable advice
Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.
Assessing a startup's data-to-AI health
Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.
AI does not obviate the need for testing and observability
It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.
My experience as a Data Tech Lead with Work on Climate
The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.
Artificial intelligence, automation, and the art of counting fish
Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.
Questions to consider when using AI for PDF data extraction
Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.
Two types of startup data problems
Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.
Avoiding AI complexity: First, write no code
Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.
Building your startup's minimum viable data stack
First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.
Nudging ChatGPT to invent books you have no time to read
Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.
Substance over titles: Your first data hire may be a data scientist
Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.
New decade, new tagline: Data & AI for Impact
Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.
Supporting volunteer monitoring of marine biodiversity with modern web and data tools
Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.
Lessons from reluctant data engineering
Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.
My rediscovery of quiet writing on the open web
Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.
Was data science a failure mode of software engineering?
Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.
How hackable are automated coding assessments?
Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.
Remaining relevant as a small language model
Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).
ChatGPT is transformative AI
My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.
Causal Machine Learning is off to a good start, despite some issues
Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.
The mission matters: Moving to climate tech as a data scientist
Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.
Building useful machine learning tools keeps getting easier: A fish ID case study
Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.
Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials
Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.
Use your human brain to avoid artificial intelligence disasters
Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.
Migrating from WordPress.com to Hugo on GitHub + Cloudflare
My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.
My work with Automattic
Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.
Some highlights from 2020
Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.
Many is not enough: Counting simulations to bootstrap the right way
Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.
Software commodities are eating interesting data science work
Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?
A day in the life of a remote data scientist
Video of a talk I gave on remote data science work at the Data Science Sydney meetup.
Bootstrapping the right way?
Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.
Hackers beware: Bootstrap sampling may be harmful
Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.
The most practical causal inference book I’ve read (is still a draft)
Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.
Reflections on remote data science work
Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.
Defining data science in 2018
Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.
Advice for aspiring data scientists and other FAQs
Frequently asked questions by visitors to this site, especially around entering the data science field.
State of Bandcamp Recommender, Late 2017
Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.
My 10-step path to becoming a remote data scientist with Automattic
I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.
Exploring and visualising Reef Life Survey data
Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.
Customer lifetime value and the proliferation of misinformation on the internet
There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.
Ask Why! Finding motives, causes, and purpose in data science
Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.
If you don’t pay attention, data can drive you off a cliff
Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.
Is Data Scientist a useless job title?
It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.
Making Bayesian A/B testing more accessible
A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.
Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions
Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.
The rise of greedy robots
Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.
Why you should stop worrying about deep learning and deepen your understanding of causality instead
Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.
The joys of offline data collection
Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.
This holiday season, give me real insights
Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.
The hardest parts of data science
Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.
Migrating a simple web application from MongoDB to Elasticsearch
Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.
Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling
Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.
The wonderful world of recommender systems
Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.
You don’t need a data scientist (yet)
Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.
Goodbye, Parse.com
Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.
Learning about deep learning through album cover classification
Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.
Hopping on the deep learning bandwagon
To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.
First steps in data science: author-aware sentiment analysis
I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.
My divestment from fossil fuels
Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.
My PhD work
An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.
The long road to a lifestyle business
Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.
Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)
My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).
Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)
Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).
Automating Parse.com bulk data imports
A script for importing data into the Parse backend-as-a-service.
Stochastic Gradient Boosting: Choosing the Best Number of Iterations
Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.
SEO: Mostly about showing up?
Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.
Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)
Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.
BCRecommender Traction Update
Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.
What is data science?
Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.
Greek Media Monitoring Kaggle competition: My approach
Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.
Applying the Traction Book’s Bullseye framework to BCRecommender
Ranking 19 channels with the goal of getting traction for BCRecommender.
Bandcamp recommendation and discovery algorithms
The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.
Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)
Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.
Building a Bandcamp recommender system (part 1 – motivation)
My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.
How to (almost) win Kaggle competitions
Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.
Data’s hierarchy of needs
Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.
Kaggle competition tips and summaries
Pointers to all my Kaggle advice posts and competition summaries.
Kaggle beginner tips
First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.
Don't build AI, build with AI
Building AI is hard and expensive. For most companies, the path to AI success is building with third-party AI interns and cheap AI cogs.
In praise of inconsistency: Ditching weekly posts
On moving away from weekly blog posts in favour of deeper inconsistent articles and LinkedIn engagement.
Data, AI, humans, and climate: Carving a consulting niche
Podcast chat on the reality of Data & AI and my consulting focus: Helping climate & nature tech startups ship data-intensive solutions.
Juggling delivery, admin, and leads: Monthly biz recap
Highlights and lessons from my solo expertise biz, including value pricing, fractional cash flow, and distractions from admin & politics.
AI hype, AI bullshit, and the real deal
My views on separating AI hype and bullshit from the real deal. The general ideas apply to past and future hype waves in tech.
Giving up on the minimum viable data stack
Exploring why universal advice on startup data stacks is challenging, and the importance of context-specific decisions in data infrastructure.
Keep learning: Your career is never truly done
Podcast chat on my career journey from software engineering to data science and independent consulting.
First year lessons from a solo expertise biz in Data & AI
Reflections on building a solo expertise business in Data & AI, focusing on climate tech startups. Lessons learned from the first year of transition.
AI/ML lifecycle models versus real-world mess
The real world of AI/ML doesn’t fit into a neat diagram, so I created another diagram and a maturity heatmap to model the mess.
Your first Data-to-AI hire: Run a lovable process
Video and key points from the second part of a webinar on a startup’s first data hire, covering tips for defining the role and running the process.
Learn about Dataland to avoid expensive hiring mistakes
Video and key points from the first part of a webinar on a startup’s first data hire, covering data & AI definitions and high-level recommendations.
Exploring an AI product idea with the latest ChatGPT, Claude, and Gemini
Asking identical questions about my MagicGrantMaker idea yielded near-identical responses from the top chatbot models.
Stay alert! Security is everyone's responsibility
Questions to assess the security posture of a startup, focusing on basic hygiene and handling of sensitive data.
Is your tech stack ready for data-intensive applications?
Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.
AI ain't gonna save you from bad data
Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.
Startup data health starts with healthy event tracking
Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.
How to avoid startups with poor development processes
Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.
Plumbing, Decisions, and Automation: De-hyping Data & AI
Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).
Question startup culture before accepting a data-to-AI role
Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.
Probing the People aspects of an early-stage startup
Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.
Business questions to ask before taking a startup data role
Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.
Mentorship and the art of actionable advice
Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.
Assessing a startup's data-to-AI health
Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.
AI does not obviate the need for testing and observability
It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.
My experience as a Data Tech Lead with Work on Climate
The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.
Artificial intelligence, automation, and the art of counting fish
Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.
Questions to consider when using AI for PDF data extraction
Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.
Two types of startup data problems
Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.
Avoiding AI complexity: First, write no code
Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.
Building your startup's minimum viable data stack
First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.
Nudging ChatGPT to invent books you have no time to read
Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.
Substance over titles: Your first data hire may be a data scientist
Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.
New decade, new tagline: Data & AI for Impact
Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.
Supporting volunteer monitoring of marine biodiversity with modern web and data tools
Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.
Lessons from reluctant data engineering
Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.
My rediscovery of quiet writing on the open web
Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.
Was data science a failure mode of software engineering?
Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.
How hackable are automated coding assessments?
Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.
Remaining relevant as a small language model
Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).
ChatGPT is transformative AI
My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.
Causal Machine Learning is off to a good start, despite some issues
Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.
The mission matters: Moving to climate tech as a data scientist
Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.
Building useful machine learning tools keeps getting easier: A fish ID case study
Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.
Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials
Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.
Use your human brain to avoid artificial intelligence disasters
Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.
Migrating from WordPress.com to Hugo on GitHub + Cloudflare
My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.
My work with Automattic
Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.
Some highlights from 2020
Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.
Many is not enough: Counting simulations to bootstrap the right way
Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.
Software commodities are eating interesting data science work
Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?
A day in the life of a remote data scientist
Video of a talk I gave on remote data science work at the Data Science Sydney meetup.
Bootstrapping the right way?
Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.
Hackers beware: Bootstrap sampling may be harmful
Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.
The most practical causal inference book I’ve read (is still a draft)
Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.
Reflections on remote data science work
Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.
Defining data science in 2018
Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.
Advice for aspiring data scientists and other FAQs
Frequently asked questions by visitors to this site, especially around entering the data science field.
State of Bandcamp Recommender, Late 2017
Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.
My 10-step path to becoming a remote data scientist with Automattic
I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.
Exploring and visualising Reef Life Survey data
Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.
Customer lifetime value and the proliferation of misinformation on the internet
There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.
Ask Why! Finding motives, causes, and purpose in data science
Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.
If you don’t pay attention, data can drive you off a cliff
Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.
Is Data Scientist a useless job title?
It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.
Making Bayesian A/B testing more accessible
A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.
Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions
Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.
The rise of greedy robots
Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.
Why you should stop worrying about deep learning and deepen your understanding of causality instead
Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.
The joys of offline data collection
Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.
This holiday season, give me real insights
Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.
The hardest parts of data science
Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.
Migrating a simple web application from MongoDB to Elasticsearch
Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.
Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling
Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.
The wonderful world of recommender systems
Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.
You don’t need a data scientist (yet)
Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.
Goodbye, Parse.com
Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.
Learning about deep learning through album cover classification
Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.
Hopping on the deep learning bandwagon
To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.
First steps in data science: author-aware sentiment analysis
I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.
My divestment from fossil fuels
Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.
My PhD work
An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.
The long road to a lifestyle business
Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.
Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)
My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).
Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)
Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).
Automating Parse.com bulk data imports
A script for importing data into the Parse backend-as-a-service.
Stochastic Gradient Boosting: Choosing the Best Number of Iterations
Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.
SEO: Mostly about showing up?
Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.
Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)
Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.
BCRecommender Traction Update
Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.
What is data science?
Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.
Greek Media Monitoring Kaggle competition: My approach
Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.
Applying the Traction Book’s Bullseye framework to BCRecommender
Ranking 19 channels with the goal of getting traction for BCRecommender.
Bandcamp recommendation and discovery algorithms
The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.
Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)
Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.
Building a Bandcamp recommender system (part 1 – motivation)
My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.
How to (almost) win Kaggle competitions
Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.
Data’s hierarchy of needs
Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.
Kaggle competition tips and summaries
Pointers to all my Kaggle advice posts and competition summaries.
Kaggle beginner tips
First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.