diff --git a/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/index.html b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/index.html new file mode 100644 index 000000000..540e4a7b3 --- /dev/null +++ b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/index.html @@ -0,0 +1,9 @@ +Is your tech stack ready for data-intensive applications? | Yanir Seroussi | Data & AI for Startup Impact +

Is your tech stack ready for data-intensive applications?

a stack of computers, wires, and hay in an office area

Data-intensive projects fail when you treat them like traditional software projects. But they also fail when you don’t apply best practices from software engineering.

Why?

Because data-intensive systems are made of data, and also made of software. Therefore:

  1. data changes can lead to failures; and
  2. software changes can lead to failures.

In traditional software systems, you fully control the changes. Your software doesn’t change unexpectedly.

In data-intensive systems, you cede control to the data. The data changes constantly, and it affects the behaviour of your system.

To succeed, you need to manage both the data and software aspects of your systems. This successful management is the essence of the questions from the Tech section of my Data-to-AI Health Check for Startups. This post presents the questions along with guidance on what constitutes healthy answers.

What do I mean by data intensity?

For the last few months, I have set my LinkedIn tagline to “helping startups ship data-intensive solutions (AI/ML for climate/nature tech)”. I landed on it after a bit of a struggle with succinctly defining exactly what it is I do.

The problem is that after over a decade of “data” roles, I don’t see the field of AI/ML (artificial intelligence and machine learning) as a sanctified sphere that’s separate from real-world data and humans. Further, while business intelligence (aka analytics) is seen by some as less “sexy” than AI/ML, I see it as a different lens of using data to drive business outcomes. Essentially, it all comes down to plumbing, decisions, and automation.

In the days of the Big Data hype, much attention was given to the three Vs of data: Volume, Velocity, and Variety – what flows through the plumbing. To me, data intensity goes beyond the three Vs. This is how I define it in the first section of my Data-to-AI Health Check:

High data intensity typically requires low-latency processing of large volumes of data with more than one database server. With high intensity, data processing issues noticeably affect key business metrics.

That is, in data-intensive settings, data issues affect decisions and automation in a way that hurts the business.

A couple of examples may help:

  • Low intensity: A dashboard that doesn’t contain any actionable metrics. If the metrics change due to bugs in the data processing, it doesn’t affect decisions.
  • High intensity: An ad-serving platform that personalises ads in real time based on numerous data points. If any model or system breaks, millions of dollars may be lost.

In short, the higher the data intensity, the more the flow of data affects the bottom line.

Understanding tech stacks and lifecycles

At 15 questions, the Tech section of my Data-to-AI Health Check for Startups is long and deep. To keep this post digestible, I won’t go into every question. Instead, I’ve grouped the questions by theme.

First up, on the tech stacks and lifecycles:

  • Q1: Provide an architecture diagram for your tech systems (product and data stacks), including first-party and third-party tools and databases. If a diagram doesn’t exist, an ad hoc drawing would work as well.
  • Q2: Zooming in on data stacks, what tools and pipelines do you use for the data engineering lifecycles (generation, storage, ingestion, transformation, and serving), and downstream uses (analytics, AI/ML, and reverse ETL)?
  • Q3: Zooming in further on the downstream uses of analytics and AI/ML, what systems, processes, and tools do you use to manage their lifecycles (discovery, data preparation, model engineering, deployment, monitoring, and maintenance)? Give specific project examples.
  • Q4: Are there any tech choices you regret? Why?
  • Q5: Are there any new tools you want to introduce to your stack? Why?

To an extent, tech stacks and lifecycles follow the Anna Karenina principle: All healthy stacks are alike; each unhealthy stack is unhealthy in its own way.

By asking for their descriptions, I’m aiming to uncover gaps and opportunities.

Often, some gaps are known to the people in charge, but they haven’t been explicitly discussed. This is especially common in startups, where competing priorities and resource constraints require compromising on scope and quality to fuel growth. In addition, it’s impossible for small startups to have all the relevant experts on the founding team, so best practices aren’t followed due to ignorance rather than due to intentional compromises made to move fast. However, a lack of awareness of best practices can often lead to the startup moving too slowly.

Two concrete examples:

  • many people outside the data world are unaware of recent advances in tooling for management of data transformations (dbt and its competitors), and
  • practitioners who’ve only built ML models in academia rarely appreciate the complexity of running ML in production (MLOps is much more than ML).

Beyond gaps that may be exposed by Q1-Q3, explicitly asking about regrettable and future tech choices (Q4 & Q5) helps surface evidence of an overreliance on unproven or exotic tech (aka wasted innovation tokens) and an underreliance on proven tech (aka reinvention of wheels). This is especially common with inexperienced operators who are too excited about playing with shiny tools. Use of unproven tech should be reserved to the cases where it confers a competitive advantage (e.g., being first to market with the latest AI advances).

Basic quality assurance and delivery

The next set of questions covers what I consider to be the basics of quality assurance and continuous delivery:

  • Q6: How do you test product code and infrastructure setup? How good is the coverage (formally – percentage of statements covered, and conceptually – confidence from 1 to 5 that tests capture faults prior to deployment)?
  • Q7: Do all tests run automatically on every version of the code?
  • Q8: Are deployments done as a single automated step (e.g., push new containers to production when the main branch is updated)?
  • Q9: How faithful are development, testing, and staging environments to the production setup? Are there gaps that can be feasibly addressed? If so, what is stopping you from addressing them?

As I’m writing this in 2024, all the tooling exists to set things up with solid testing and deployment processes – and it’s constantly getting easier. The only place where such processes may be skipped is in throwaway prototypes, where testing unnecessarily slows things down.

Being a startup is also not an excuse. As Martin Fowler pointed out years ago, the internal quality of software doesn’t incur a cost. That is, by implementing solid systems and processes for automated testing and deployment, teams move faster. Teams that cut corners on internal quality may move faster in the very short term, but typically get overtaken by their higher-internal-quality counterparts within weeks.

No startup aims to be around only for a few weeks, so investing in internal quality is key to tech health.

In Fowler’s words:

  • Neglecting internal quality leads to rapid build up of cruft
  • This cruft slows down feature development
  • Even a great team produces cruft, but by keeping internal quality high, is able to keep it under control
  • High internal quality keeps cruft to a minimum, allowing a team to add features with less effort, time, and cost

Unfortunately, some software engineers never learn this lesson. Further, data professionals that don’t have a software background are even less likely to be exposed to the importance of internal quality and how it can be enforced.

That said, it’s never too late to learn and improve. This is key to avoiding failure modes that arise in data projects when best practices from software engineering aren’t applied.

Specific data-intensive failure modes

The next set of questions probes for failure modes that are specific to data-intensive work (data engineering, analytics, AI/ML, etc.):

  • Q10: Do you apply the same standards of testing and deploying product code to data? For example, is there untested SQL code hidden in dashboarding tools or the database layer, or is SQL treated like core product code (tracked in source control with isolated testing)?
  • Q11: How are schema changes managed and tested in each data system?
  • Q12: Do you rely on notebooks for production data code? If so, how do you ensure that notebook code meets the same quality standards as core product code (especially around testing and change management)?
  • Q13: Do advanced AI/ML projects meet your performance expectations? If not, do you know how to improve performance without data changes?

Data-intensive work is essentially about building models with software:

  • Raw data is a model of real-world entities and events, expressed in database schemas (even “schemaless” databases have a schema – it’s just unbounded).
  • Dashboards present models of metrics that originate in raw data, with the goal of informing decisions.
  • AI/ML models are essentially complex data transformations, e.g., from a matrix of pixels to a probability that the image modelled by the pixels is of a cat or a dog.

Due to historical and practical reasons, much of this modelling work is done by people with no training in software engineering. While the industry is maturing, Q10-13 often expose gaps. The ideal answer to each question is that all models are fully tested and managed – just like software, but with extra care for the complexity introduced by data.

Maintaining long-term success

Finally, the last two questions cover monitoring and maintenance:

  • Q14: On a scale of 1 to 5, how confident are you in detecting and addressing issues in production (including product, infra, data, and ML observability 1.0 & 2.0)? Do you have action plans to increase your level of confidence?
  • Q15: What DevOps, DataOps, and MLOps practices do you follow that weren’t covered above? Are there known gaps and plans to address them?

Even if a data-intensive project is considered “done”, it still changes in production due to its dependence on data. The degree of likely change varies by project, but it needs to be actively managed for long-term success.

Data-to-AI health beyond the tech

This post is part of a series on my Data-to-AI Health Check for Startups. Previous posts:

You can download a guide containing all the questions as a PDF. Next, I’ll go into the questions from the Security & Compliance section. Feedback is always welcome!

Subscribe +
    + +

    Public comments are closed, but I love hearing from readers. Feel free to +contact me with your thoughts.

    \ No newline at end of file diff --git a/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp new file mode 100644 index 000000000..26ece11b4 Binary files /dev/null and b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack.webp differ diff --git a/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_1080x0_resize_q75_h2_box_2.webp b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_1080x0_resize_q75_h2_box_2.webp new file mode 100644 index 000000000..87ad28176 Binary files /dev/null and b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_1080x0_resize_q75_h2_box_2.webp differ diff --git a/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_360x0_resize_q75_h2_box_2.webp b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_360x0_resize_q75_h2_box_2.webp new file mode 100644 index 000000000..a50c23352 Binary files /dev/null and b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_360x0_resize_q75_h2_box_2.webp differ diff --git a/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_480x0_resize_q75_h2_box_2.webp b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_480x0_resize_q75_h2_box_2.webp new file mode 100644 index 000000000..29f9bb157 Binary files /dev/null and b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_480x0_resize_q75_h2_box_2.webp differ diff --git a/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_720x0_resize_q75_h2_box_2.webp b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_720x0_resize_q75_h2_box_2.webp new file mode 100644 index 000000000..340e6cc9b Binary files /dev/null and b/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/modern-tech-stack_hu581202d4a2bac0f4b81f59adcd992e11_114668_720x0_resize_q75_h2_box_2.webp differ diff --git a/index.xml b/index.xml index 4ddba40fd..b80a2451e 100644 --- a/index.xml +++ b/index.xml @@ -1,4 +1,4 @@ -Yanir Seroussi | Data & AI for Startup Impacthttps://yanirseroussi.com/Recent content on Yanir Seroussi | Data & AI for Startup ImpactHugo -- gohugo.ioen-auText and figures licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by [Yanir Seroussi](https://yanirseroussi.com/about/), except where noted otherwiseSat, 22 Jun 2024 22:50:00 +0000Dealing with endless data changeshttps://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/Sat, 22 Jun 2024 22:50:00 +0000https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.AI ain't gonna save you from bad datahttps://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Mon, 17 Jun 2024 02:00:00 +0000https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.The rules of the passion economyhttps://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/Wed, 12 Jun 2024 02:50:00 +0000https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/Summary of the main messages from the book The Passion Economy by Adam Davidson.Startup data health starts with healthy event trackinghttps://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Mon, 10 Jun 2024 04:00:00 +0000https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.How to avoid startups with poor development processeshttps://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Mon, 03 Jun 2024 02:45:00 +0000https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.Plumbing, Decisions, and Automation: De-hyping Data & AIhttps://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Mon, 27 May 2024 02:00:00 +0000https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).Adapting to the economy of algorithmshttps://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/Sat, 25 May 2024 00:00:00 +0000https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.Question startup culture before accepting a data-to-AI rolehttps://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Mon, 20 May 2024 02:25:00 +0000https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.Probing the People aspects of an early-stage startuphttps://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Mon, 13 May 2024 02:00:00 +0000https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.Business questions to ask before taking a startup data rolehttps://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Mon, 06 May 2024 04:30:00 +0000https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.Mentorship and the art of actionable advicehttps://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Mon, 29 Apr 2024 06:30:00 +0000https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.Assessing a startup's data-to-AI healthhttps://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Mon, 22 Apr 2024 06:00:00 +0000https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.AI does not obviate the need for testing and observabilityhttps://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/Mon, 15 Apr 2024 05:00:00 +0000https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.LinkedIn is a teachable skillhttps://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/Thu, 11 Apr 2024 01:45:25 +0000https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/An high-level overview of things I learned from Justin Welsh’s LinkedIn Operating System course.My experience as a Data Tech Lead with Work on Climatehttps://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/Mon, 08 Apr 2024 02:00:00 +0000https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.The data engineering lifecycle is not going anywherehttps://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/Fri, 05 Apr 2024 01:00:00 +0000https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley.Artificial intelligence, automation, and the art of counting fishhttps://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Mon, 01 Apr 2024 06:00:00 +0000https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.Atomic Habits is full of actionable advicehttps://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/Tue, 12 Mar 2024 06:19:31 +0000https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/I put the book to use after the first listen, and will definitely revisit it in the future to form better habits.Questions to consider when using AI for PDF data extractionhttps://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Mon, 11 Mar 2024 00:00:00 +0000https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.Two types of startup data problemshttps://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Mon, 04 Mar 2024 02:00:00 +0000https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.Avoiding AI complexity: First, write no codehttps://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Mon, 26 Feb 2024 01:45:00 +0000https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.Building your startup's minimum viable data stackhttps://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/Mon, 19 Feb 2024 00:00:00 +0000https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.The three Cs of indie consulting: Confidence, Cash, and Connectionshttps://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/Sat, 17 Feb 2024 02:00:00 +0000https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting.Nudging ChatGPT to invent books you have no time to readhttps://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Mon, 12 Feb 2024 05:00:00 +0000https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.Future software development may require fewer humanshttps://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/Tue, 06 Feb 2024 06:15:00 +0000https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/Reflecting on an interview with Jason Warner, CEO of poolside.Substance over titles: Your first data hire may be a data scientisthttps://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Mon, 05 Feb 2024 02:45:00 +0000https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.New decade, new tagline: Data & AI for Impacthttps://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Fri, 19 Jan 2024 00:00:00 +0000https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.Psychographic specialisations may work for discipline generalistshttps://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/Tue, 09 Jan 2024 03:00:00 +0000https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/When focusing on a market segment defined by personal beliefs, it’s often fine to position yourself as a generalist in your craft.The power of parasocial relationshipshttps://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/Mon, 08 Jan 2024 06:00:00 +0000https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/Repeated exposure to media personas creates relationships that help justify premium fees.Positioning is a common problem for data scientistshttps://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/Mon, 18 Dec 2023 00:30:00 +0000https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark.Transfer learning applies to energy market biddinghttps://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/Thu, 14 Dec 2023 00:15:00 +0000https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland.Supporting volunteer monitoring of marine biodiversity with modern web and data toolshttps://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Wed, 29 Nov 2023 02:00:00 +0000https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.Our Blue Machine is changing, but we are not helplesshttps://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/Tue, 28 Nov 2023 06:40:00 +0000https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/One of my many highlights from Helen Czerski’s Blue Machine.You don't need a proprietary API for static mapshttps://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/Tue, 21 Nov 2023 06:00:00 +0000https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.Lessons from reluctant data engineeringhttps://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Wed, 25 Oct 2023 04:45:00 +0000https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.Artificial intelligence was a marketing term all along – just call it automationhttps://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/Fri, 06 Oct 2023 05:00:00 +0000https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/Replacing ‘artificial intelligence’ with ‘automation’ is a useful trick for cutting through the hype.The lines between solo consulting and product building are blurryhttps://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/Mon, 25 Sep 2023 00:00:00 +0000https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/It turns out that problems like finding a niche and defining the ideal clients are key to any solo business.Google's Rules of Machine Learning still apply in the age of large language modelshttps://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/Thu, 21 Sep 2023 21:30:00 +0000https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.My rediscovery of quiet writing on the open webhttps://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Mon, 28 Aug 2023 05:30:00 +0000https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.The Minimalist Entrepreneur is too prescriptive for mehttps://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/Mon, 21 Aug 2023 03:15:00 +0000https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder’s experience.Revisiting Start Small, Stay Small in 2023 (Chapter 2)https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/Thu, 17 Aug 2023 07:45:00 +0000https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.Revisiting Start Small, Stay Small in 2023 (Chapter 1)https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/Wed, 16 Aug 2023 05:45:00 +0000https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.Email notifications on public GitHub commitshttps://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/Mon, 14 Aug 2023 05:15:00 +0000https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits.The rule of thirds can probably be ignoredhttps://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/Fri, 11 Aug 2023 03:15:00 +0000https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/Turns out that the rule of thirds for composing visuals may not be that important.Using YubiKey for SSH accesshttps://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/Sun, 23 Jul 2023 00:07:15 +0000https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04.Making a TIL section with Hugo and PaperModhttps://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/Mon, 17 Jul 2023 00:06:15 +0000https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/How I added a Today I Learned section to my Hugo site with the PaperMod theme.You can't save timehttps://yanirseroussi.com/til/2023/07/11/you-cant-save-time/Tue, 11 Jul 2023 00:00:00 +0000https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/Time can be spent doing different activities, but it can’t be stored and saved for later.Was data science a failure mode of software engineering?https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Fri, 30 Jun 2023 00:06:30 +0000https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.How hackable are automated coding assessments?https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Fri, 26 May 2023 00:03:00 +0000https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.Remaining relevant as a small language modelhttps://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Fri, 21 Apr 2023 00:06:30 +0000https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).ChatGPT is transformative AIhttps://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/Sun, 11 Dec 2022 00:00:00 +0000https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.Causal Machine Learning is off to a good start, despite some issueshttps://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Mon, 12 Sep 2022 02:45:00 +0000https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.The mission matters: Moving to climate tech as a data scientisthttps://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Mon, 06 Jun 2022 00:00:00 +0000https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.Building useful machine learning tools keeps getting easier: A fish ID case studyhttps://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Sun, 20 Mar 2022 04:30:00 +0000https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trialshttps://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Fri, 14 Jan 2022 00:05:40 +0000https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.Use your human brain to avoid artificial intelligence disastershttps://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Mon, 22 Nov 2021 03:45:00 +0000https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.Migrating from WordPress.com to Hugo on GitHub + Cloudflarehttps://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/Wed, 10 Nov 2021 06:30:00 +0000https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.My work with Automattichttps://yanirseroussi.com/2021/10/07/my-work-with-automattic/Thu, 07 Oct 2021 00:00:00 +0000https://yanirseroussi.com/2021/10/07/my-work-with-automattic/Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.Some highlights from 2020https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Mon, 05 Apr 2021 06:41:48 +0000https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.Many is not enough: Counting simulations to bootstrap the right wayhttps://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Mon, 24 Aug 2020 01:35:17 +0000https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.Software commodities are eating interesting data science workhttps://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Sat, 11 Jan 2020 09:22:35 +0000https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?A day in the life of a remote data scientisthttps://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Wed, 11 Dec 2019 22:06:19 +0000https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Video of a talk I gave on remote data science work at the Data Science Sydney meetup.Bootstrapping the right way?https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Sun, 06 Oct 2019 06:48:07 +0000https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.Hackers beware: Bootstrap sampling may be harmfulhttps://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Mon, 07 Jan 2019 21:07:56 +0000https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.The most practical causal inference book I’ve read (is still a draft)https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Mon, 24 Dec 2018 02:37:50 +0000https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.Reflections on remote data science workhttps://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Sat, 03 Nov 2018 06:33:13 +0000https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.Defining data science in 2018https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Sun, 22 Jul 2018 08:27:43 +0000https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.Advice for aspiring data scientists and other FAQshttps://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Sun, 15 Oct 2017 09:15:25 +0000https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Frequently asked questions by visitors to this site, especially around entering the data science field.State of Bandcamp Recommender, Late 2017https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Sat, 02 Sep 2017 10:19:02 +0000https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.My 10-step path to becoming a remote data scientist with Automattichttps://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/Sat, 29 Jul 2017 05:39:26 +0000https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.Exploring and visualising Reef Life Survey datahttps://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Sat, 03 Jun 2017 00:49:05 +0000https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.Customer lifetime value and the proliferation of misinformation on the internethttps://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/Sun, 08 Jan 2017 20:02:30 +0000https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.Ask Why! Finding motives, causes, and purpose in data sciencehttps://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Mon, 19 Sep 2016 21:28:44 +0000https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.If you don’t pay attention, data can drive you off a cliffhttps://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Sun, 21 Aug 2016 21:34:17 +0000https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.Is Data Scientist a useless job title?https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/Thu, 04 Aug 2016 22:26:03 +0000https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.Making Bayesian A/B testing more accessiblehttps://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/Sun, 19 Jun 2016 10:32:15 +0000https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptionshttps://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Sat, 14 May 2016 19:57:03 +0000https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.The rise of greedy robotshttps://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Sun, 20 Mar 2016 20:33:43 +0000https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.Why you should stop worrying about deep learning and deepen your understanding of causality insteadhttps://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Sun, 14 Feb 2016 11:04:11 +0000https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.The joys of offline data collectionhttps://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Sun, 24 Jan 2016 00:32:25 +0000https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.This holiday season, give me real insightshttps://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Tue, 08 Dec 2015 06:57:25 +0000https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.The hardest parts of data sciencehttps://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Mon, 23 Nov 2015 04:14:21 +0000https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.Migrating a simple web application from MongoDB to Elasticsearchhttps://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Wed, 04 Nov 2015 03:53:18 +0000https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.Miscommunicating science: Simplistic models, nutritionism, and the art of storytellinghttps://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Mon, 19 Oct 2015 00:02:32 +0000https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.The wonderful world of recommender systemshttps://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Fri, 02 Oct 2015 05:25:57 +0000https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.You don’t need a data scientist (yet)https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Mon, 24 Aug 2015 08:25:30 +0000https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.Goodbye, Parse.comhttps://yanirseroussi.com/2015/07/31/goodbye-parse-com/Fri, 31 Jul 2015 03:29:50 +0000https://yanirseroussi.com/2015/07/31/goodbye-parse-com/Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.Learning about deep learning through album cover classificationhttps://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Mon, 06 Jul 2015 22:21:42 +0000https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.Deep learning resourceshttps://yanirseroussi.com/deep-learning-resources/Mon, 06 Jul 2015 00:38:44 +0000https://yanirseroussi.com/deep-learning-resources/This page summarises the deep learning resources I’ve consulted in my album cover classification project. +Yanir Seroussi | Data & AI for Startup Impacthttps://yanirseroussi.com/Recent content on Yanir Seroussi | Data & AI for Startup ImpactHugo -- gohugo.ioen-auText and figures licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by [Yanir Seroussi](https://yanirseroussi.com/about/), except where noted otherwiseMon, 24 Jun 2024 02:00:00 +0000Is your tech stack ready for data-intensive applications?https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/Mon, 24 Jun 2024 02:00:00 +0000https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.Dealing with endless data changeshttps://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/Sat, 22 Jun 2024 22:50:00 +0000https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/Quotes from Demetrios Brinkmann on the relationship between MLOps and DevOps, with MLOps allowing for managing changes that come from data.AI ain't gonna save you from bad datahttps://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Mon, 17 Jun 2024 02:00:00 +0000https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.The rules of the passion economyhttps://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/Wed, 12 Jun 2024 02:50:00 +0000https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/Summary of the main messages from the book The Passion Economy by Adam Davidson.Startup data health starts with healthy event trackinghttps://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Mon, 10 Jun 2024 04:00:00 +0000https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.How to avoid startups with poor development processeshttps://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Mon, 03 Jun 2024 02:45:00 +0000https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.Plumbing, Decisions, and Automation: De-hyping Data & AIhttps://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Mon, 27 May 2024 02:00:00 +0000https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).Adapting to the economy of algorithmshttps://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/Sat, 25 May 2024 00:00:00 +0000https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/Overview of the book The Economy of Algorithms by Marek Kowalkiewicz.Question startup culture before accepting a data-to-AI rolehttps://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Mon, 20 May 2024 02:25:00 +0000https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.Probing the People aspects of an early-stage startuphttps://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Mon, 13 May 2024 02:00:00 +0000https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.Business questions to ask before taking a startup data rolehttps://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Mon, 06 May 2024 04:30:00 +0000https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.Mentorship and the art of actionable advicehttps://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Mon, 29 Apr 2024 06:30:00 +0000https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.Assessing a startup's data-to-AI healthhttps://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Mon, 22 Apr 2024 06:00:00 +0000https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.AI does not obviate the need for testing and observabilityhttps://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/Mon, 15 Apr 2024 05:00:00 +0000https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.LinkedIn is a teachable skillhttps://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/Thu, 11 Apr 2024 01:45:25 +0000https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/An high-level overview of things I learned from Justin Welsh’s LinkedIn Operating System course.My experience as a Data Tech Lead with Work on Climatehttps://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/Mon, 08 Apr 2024 02:00:00 +0000https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.The data engineering lifecycle is not going anywherehttps://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/Fri, 05 Apr 2024 01:00:00 +0000https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/My key takeaways from reading Fundamentals of Data Engineering by Joe Reis and Matt Housley.Artificial intelligence, automation, and the art of counting fishhttps://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Mon, 01 Apr 2024 06:00:00 +0000https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.Atomic Habits is full of actionable advicehttps://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/Tue, 12 Mar 2024 06:19:31 +0000https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/I put the book to use after the first listen, and will definitely revisit it in the future to form better habits.Questions to consider when using AI for PDF data extractionhttps://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Mon, 11 Mar 2024 00:00:00 +0000https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.Two types of startup data problemshttps://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Mon, 04 Mar 2024 02:00:00 +0000https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.Avoiding AI complexity: First, write no codehttps://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Mon, 26 Feb 2024 01:45:00 +0000https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.Building your startup's minimum viable data stackhttps://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/Mon, 19 Feb 2024 00:00:00 +0000https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.The three Cs of indie consulting: Confidence, Cash, and Connectionshttps://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/Sat, 17 Feb 2024 02:00:00 +0000https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/Jonathan Stark makes a compelling argument why you should have the three Cs before quitting your job to go solo consulting.Nudging ChatGPT to invent books you have no time to readhttps://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Mon, 12 Feb 2024 05:00:00 +0000https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.Future software development may require fewer humanshttps://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/Tue, 06 Feb 2024 06:15:00 +0000https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/Reflecting on an interview with Jason Warner, CEO of poolside.Substance over titles: Your first data hire may be a data scientisthttps://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Mon, 05 Feb 2024 02:45:00 +0000https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.New decade, new tagline: Data & AI for Impacthttps://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Fri, 19 Jan 2024 00:00:00 +0000https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.Psychographic specialisations may work for discipline generalistshttps://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/Tue, 09 Jan 2024 03:00:00 +0000https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/When focusing on a market segment defined by personal beliefs, it’s often fine to position yourself as a generalist in your craft.The power of parasocial relationshipshttps://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/Mon, 08 Jan 2024 06:00:00 +0000https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/Repeated exposure to media personas creates relationships that help justify premium fees.Positioning is a common problem for data scientistshttps://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/Mon, 18 Dec 2023 00:30:00 +0000https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/With the commodification of data scientists, the problem of positioning has become more common: My takeaways from Genevieve Hayes interviewing Jonathan Stark.Transfer learning applies to energy market biddinghttps://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/Thu, 14 Dec 2023 00:15:00 +0000https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/An interesting approach to bidding of energy storage assets, showing that training on New York data is transferable to Queensland.Supporting volunteer monitoring of marine biodiversity with modern web and data toolshttps://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Wed, 29 Nov 2023 02:00:00 +0000https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.Our Blue Machine is changing, but we are not helplesshttps://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/Tue, 28 Nov 2023 06:40:00 +0000https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/One of my many highlights from Helen Czerski’s Blue Machine.You don't need a proprietary API for static mapshttps://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/Tue, 21 Nov 2023 06:00:00 +0000https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/For many use cases, libraries like cartopy are better than the likes of Mapbox and Google Maps.Lessons from reluctant data engineeringhttps://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Wed, 25 Oct 2023 04:45:00 +0000https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.Artificial intelligence was a marketing term all along – just call it automationhttps://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/Fri, 06 Oct 2023 05:00:00 +0000https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/Replacing ‘artificial intelligence’ with ‘automation’ is a useful trick for cutting through the hype.The lines between solo consulting and product building are blurryhttps://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/Mon, 25 Sep 2023 00:00:00 +0000https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/It turns out that problems like finding a niche and defining the ideal clients are key to any solo business.Google's Rules of Machine Learning still apply in the age of large language modelshttps://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/Thu, 21 Sep 2023 21:30:00 +0000https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/Despite the excitement around large language models, building with machine learning remains an engineering problem with established best practices.My rediscovery of quiet writing on the open webhttps://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Mon, 28 Aug 2023 05:30:00 +0000https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.The Minimalist Entrepreneur is too prescriptive for mehttps://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/Mon, 21 Aug 2023 03:15:00 +0000https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/While I found the story of Gumroad interesting, The Minimalist Entrepreneur seems to over-generalise from the founder’s experience.Revisiting Start Small, Stay Small in 2023 (Chapter 2)https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/Thu, 17 Aug 2023 07:45:00 +0000https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/A summary of the second chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.Revisiting Start Small, Stay Small in 2023 (Chapter 1)https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/Wed, 16 Aug 2023 05:45:00 +0000https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/A summary of the first chapter of Rob Walling’s Start Small, Stay Small, along with my thoughts & reflections.Email notifications on public GitHub commitshttps://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/Mon, 14 Aug 2023 05:15:00 +0000https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/GitHub publishes an Atom feed, which means you can use any RSS reader to follow commits.The rule of thirds can probably be ignoredhttps://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/Fri, 11 Aug 2023 03:15:00 +0000https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/Turns out that the rule of thirds for composing visuals may not be that important.Using YubiKey for SSH accesshttps://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/Sun, 23 Jul 2023 00:07:15 +0000https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/Some pointers for setting up SSH access with YubiKey on Ubuntu 22.04.Making a TIL section with Hugo and PaperModhttps://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/Mon, 17 Jul 2023 00:06:15 +0000https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/How I added a Today I Learned section to my Hugo site with the PaperMod theme.You can't save timehttps://yanirseroussi.com/til/2023/07/11/you-cant-save-time/Tue, 11 Jul 2023 00:00:00 +0000https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/Time can be spent doing different activities, but it can’t be stored and saved for later.Was data science a failure mode of software engineering?https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Fri, 30 Jun 2023 00:06:30 +0000https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.How hackable are automated coding assessments?https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Fri, 26 May 2023 00:03:00 +0000https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.Remaining relevant as a small language modelhttps://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Fri, 21 Apr 2023 00:06:30 +0000https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).ChatGPT is transformative AIhttps://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/Sun, 11 Dec 2022 00:00:00 +0000https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.Causal Machine Learning is off to a good start, despite some issueshttps://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Mon, 12 Sep 2022 02:45:00 +0000https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.The mission matters: Moving to climate tech as a data scientisthttps://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Mon, 06 Jun 2022 00:00:00 +0000https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.Building useful machine learning tools keeps getting easier: A fish ID case studyhttps://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Sun, 20 Mar 2022 04:30:00 +0000https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trialshttps://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Fri, 14 Jan 2022 00:05:40 +0000https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.Use your human brain to avoid artificial intelligence disastershttps://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Mon, 22 Nov 2021 03:45:00 +0000https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.Migrating from WordPress.com to Hugo on GitHub + Cloudflarehttps://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/Wed, 10 Nov 2021 06:30:00 +0000https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.My work with Automattichttps://yanirseroussi.com/2021/10/07/my-work-with-automattic/Thu, 07 Oct 2021 00:00:00 +0000https://yanirseroussi.com/2021/10/07/my-work-with-automattic/Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.Some highlights from 2020https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Mon, 05 Apr 2021 06:41:48 +0000https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.Many is not enough: Counting simulations to bootstrap the right wayhttps://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Mon, 24 Aug 2020 01:35:17 +0000https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.Software commodities are eating interesting data science workhttps://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Sat, 11 Jan 2020 09:22:35 +0000https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?A day in the life of a remote data scientisthttps://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Wed, 11 Dec 2019 22:06:19 +0000https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Video of a talk I gave on remote data science work at the Data Science Sydney meetup.Bootstrapping the right way?https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Sun, 06 Oct 2019 06:48:07 +0000https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.Hackers beware: Bootstrap sampling may be harmfulhttps://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Mon, 07 Jan 2019 21:07:56 +0000https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.The most practical causal inference book I’ve read (is still a draft)https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Mon, 24 Dec 2018 02:37:50 +0000https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.Reflections on remote data science workhttps://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Sat, 03 Nov 2018 06:33:13 +0000https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.Defining data science in 2018https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Sun, 22 Jul 2018 08:27:43 +0000https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.Advice for aspiring data scientists and other FAQshttps://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Sun, 15 Oct 2017 09:15:25 +0000https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Frequently asked questions by visitors to this site, especially around entering the data science field.State of Bandcamp Recommender, Late 2017https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Sat, 02 Sep 2017 10:19:02 +0000https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.My 10-step path to becoming a remote data scientist with Automattichttps://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/Sat, 29 Jul 2017 05:39:26 +0000https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.Exploring and visualising Reef Life Survey datahttps://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Sat, 03 Jun 2017 00:49:05 +0000https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.Customer lifetime value and the proliferation of misinformation on the internethttps://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/Sun, 08 Jan 2017 20:02:30 +0000https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.Ask Why! Finding motives, causes, and purpose in data sciencehttps://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Mon, 19 Sep 2016 21:28:44 +0000https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.If you don’t pay attention, data can drive you off a cliffhttps://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Sun, 21 Aug 2016 21:34:17 +0000https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.Is Data Scientist a useless job title?https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/Thu, 04 Aug 2016 22:26:03 +0000https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.Making Bayesian A/B testing more accessiblehttps://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/Sun, 19 Jun 2016 10:32:15 +0000https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptionshttps://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Sat, 14 May 2016 19:57:03 +0000https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.The rise of greedy robotshttps://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Sun, 20 Mar 2016 20:33:43 +0000https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.Why you should stop worrying about deep learning and deepen your understanding of causality insteadhttps://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Sun, 14 Feb 2016 11:04:11 +0000https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.The joys of offline data collectionhttps://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Sun, 24 Jan 2016 00:32:25 +0000https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.This holiday season, give me real insightshttps://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Tue, 08 Dec 2015 06:57:25 +0000https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.The hardest parts of data sciencehttps://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Mon, 23 Nov 2015 04:14:21 +0000https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.Migrating a simple web application from MongoDB to Elasticsearchhttps://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Wed, 04 Nov 2015 03:53:18 +0000https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.Miscommunicating science: Simplistic models, nutritionism, and the art of storytellinghttps://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Mon, 19 Oct 2015 00:02:32 +0000https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.The wonderful world of recommender systemshttps://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Fri, 02 Oct 2015 05:25:57 +0000https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.You don’t need a data scientist (yet)https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Mon, 24 Aug 2015 08:25:30 +0000https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.Goodbye, Parse.comhttps://yanirseroussi.com/2015/07/31/goodbye-parse-com/Fri, 31 Jul 2015 03:29:50 +0000https://yanirseroussi.com/2015/07/31/goodbye-parse-com/Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.Learning about deep learning through album cover classificationhttps://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Mon, 06 Jul 2015 22:21:42 +0000https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.Deep learning resourceshttps://yanirseroussi.com/deep-learning-resources/Mon, 06 Jul 2015 00:38:44 +0000https://yanirseroussi.com/deep-learning-resources/This page summarises the deep learning resources I’ve consulted in my album cover classification project. Tutorials and blog posts Convolutional Neural Networks for Visual Recognition Stanford course notes: an excellent resource, very up-to-date and useful, despite still being a work in progress DeepLearning.net’s Theano-based tutorials: not as up-to-date as the Stanford course notes, but still a good introduction to some of the theory and general Theano usage Lasagne’s documentation and tutorials: still a bit lacking, but good when you know what you’re looking for lasagne4newbs: Lasagne’s convnet example with richer comments Using convolutional neural nets to detect facial keypoints tutorial: the resource that made me want to use Lasagne Classifying plankton with deep neural networks: an epic post, which I found while looking for Lasagne examples Various Wikipedia pages: a bit disappointing – the above resources are much better Papers Adam: a method for stochastic optimization (Kingma and Ba, 2015): an improvement over SGD with Nesterov momentum, AdaGrad and RMSProp, which I found to be useful in practice Algorithms for Hyper-Parameter Optimization (Bergstra et al.Hopping on the deep learning bandwagonhttps://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/Sat, 06 Jun 2015 05:00:22 +0000https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.First steps in data science: author-aware sentiment analysishttps://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/Sat, 02 May 2015 08:31:10 +0000https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.My divestment from fossil fuelshttps://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/Fri, 24 Apr 2015 00:19:36 +0000https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.My PhD workhttps://yanirseroussi.com/phd-work/Mon, 30 Mar 2015 03:23:33 +0000https://yanirseroussi.com/phd-work/An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.The long road to a lifestyle businesshttps://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/Sun, 22 Mar 2015 09:43:47 +0000https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/Wed, 11 Feb 2015 06:34:17 +0000https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/Thu, 29 Jan 2015 10:37:39 +0000https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).Automating Parse.com bulk data importshttps://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/Thu, 15 Jan 2015 04:41:16 +0000https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/A script for importing data into the Parse backend-as-a-service.Stochastic Gradient Boosting: Choosing the Best Number of Iterationshttps://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/Mon, 29 Dec 2014 02:30:06 +0000https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.SEO: Mostly about showing up?https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/Mon, 15 Dec 2014 04:25:25 +0000https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/Wed, 19 Nov 2014 09:17:34 +0000https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.BCRecommender Traction Updatehttps://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/Wed, 05 Nov 2014 02:29:35 +0000https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.What is data science?https://yanirseroussi.com/2014/10/23/what-is-data-science/Thu, 23 Oct 2014 03:22:08 +0000https://yanirseroussi.com/2014/10/23/what-is-data-science/Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.Greek Media Monitoring Kaggle competition: My approachhttps://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/Tue, 07 Oct 2014 03:21:35 +0000https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.Applying the Traction Book’s Bullseye framework to BCRecommenderhttps://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/Wed, 24 Sep 2014 04:57:39 +0000https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/Ranking 19 channels with the goal of getting traction for BCRecommender.Bandcamp recommendation and discovery algorithmshttps://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/Fri, 19 Sep 2014 14:26:55 +0000https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/Sun, 07 Sep 2014 10:48:44 +0000https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.Building a Bandcamp recommender system (part 1 – motivation)https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/Sat, 30 Aug 2014 08:11:38 +0000https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.How to (almost) win Kaggle competitionshttps://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/Sun, 24 Aug 2014 12:40:53 +0000https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.Data’s hierarchy of needshttps://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/Sun, 17 Aug 2014 13:09:30 +0000https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.Kaggle competition tips and summarieshttps://yanirseroussi.com/kaggle/Sat, 05 Apr 2014 23:46:10 +0000https://yanirseroussi.com/kaggle/Pointers to all my Kaggle advice posts and competition summaries.Kaggle beginner tipshttps://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/Sun, 19 Jan 2014 10:34:28 +0000https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.About Mehttps://yanirseroussi.com/about/Mon, 01 Jan 0001 00:00:00 +0000https://yanirseroussi.com/about/About Yanir Seroussi, a hands-on data tech lead with over a decade of experience.Book a free fifteen-minute callhttps://yanirseroussi.com/free-intro-call/Mon, 01 Jan 0001 00:00:00 +0000https://yanirseroussi.com/free-intro-call/Booking form for a quick intro call with Yanir Seroussi.Causal inference resourceshttps://yanirseroussi.com/causal-inference-resources/Mon, 01 Jan 0001 00:00:00 +0000https://yanirseroussi.com/causal-inference-resources/This is a list of some causal inference resources, which I update from time to time. You can also check out my posts on causal inference and A/B testing. Books: Causal Inference: What if by Miguel Hernán and Jamie Robins: The most practical book I’ve read. Highly recommended. Trustworthy Online Controlled Experiments : A Practical Guide to A/B Testing by Ron Kohavi, Diane Tang, and Ya Xu: Building on the authors’ decades of industry experience, this is pretty much the bible of online experiments, which is how causal inference is often done in practice.Data & AI Consulting for Startupshttps://yanirseroussi.com/consult/Mon, 01 Jan 0001 00:00:00 +0000https://yanirseroussi.com/consult/Yanir Seroussi’s Data & AI consulting services, mostly targeting startups and scaleups focused on positive-impact outcomes.Free Guide: Data-to-AI Health Check for Startupshttps://yanirseroussi.com/data-to-ai-health-check/Mon, 01 Jan 0001 00:00:00 +0000https://yanirseroussi.com/data-to-ai-health-check/Download a free PDF guide that helps you assess a startup’s Data-to-AI health by probing eight key areas.Stay in touchhttps://yanirseroussi.com/contact/Mon, 01 Jan 0001 00:00:00 +0000https://yanirseroussi.com/contact/Contact me or subscribe to the mailing list.Talkshttps://yanirseroussi.com/talks/Mon, 01 Jan 0001 00:00:00 +0000https://yanirseroussi.com/talks/Just a list of some talks I’ve given, saved here for future reference and for general public benefit. diff --git a/posts/index.html b/posts/index.html index 7c5dd929b..a23ab9042 100644 --- a/posts/index.html +++ b/posts/index.html @@ -11,7 +11,7 @@ ">

    AI ain't gonna save you from bad data

    Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.

    June 17, 2024

    Startup data health starts with healthy event tracking

    Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.

    June 10, 2024

    How to avoid startups with poor development processes

    Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.

    June 3, 2024

    Plumbing, Decisions, and Automation: De-hyping Data & AI

    Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).

    May 27, 2024

    Question startup culture before accepting a data-to-AI role

    Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.

    May 20, 2024

    Probing the People aspects of an early-stage startup

    Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.

    May 13, 2024

    Business questions to ask before taking a startup data role

    Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.

    May 6, 2024

    Mentorship and the art of actionable advice

    Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.

    April 29, 2024

    Assessing a startup's data-to-AI health

    Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.

    April 22, 2024

    AI does not obviate the need for testing and observability

    It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.

    April 15, 2024

    My experience as a Data Tech Lead with Work on Climate

    The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.

    April 8, 2024

    Artificial intelligence, automation, and the art of counting fish

    Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.

    April 1, 2024

    Questions to consider when using AI for PDF data extraction

    Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.

    March 11, 2024

    Two types of startup data problems

    Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.

    March 4, 2024

    Avoiding AI complexity: First, write no code

    Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.

    February 26, 2024

    Building your startup's minimum viable data stack

    First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.

    February 19, 2024

    Nudging ChatGPT to invent books you have no time to read

    Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.

    February 12, 2024

    Substance over titles: Your first data hire may be a data scientist

    Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.

    February 5, 2024

    New decade, new tagline: Data & AI for Impact

    Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.

    January 19, 2024

    Supporting volunteer monitoring of marine biodiversity with modern web and data tools

    Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.

    November 29, 2023

    Lessons from reluctant data engineering

    Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.

    October 25, 2023

    My rediscovery of quiet writing on the open web

    Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.

    August 28, 2023

    Was data science a failure mode of software engineering?

    Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.

    June 30, 2023

    How hackable are automated coding assessments?

    Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.

    May 26, 2023

    Remaining relevant as a small language model

    Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).

    April 21, 2023

    ChatGPT is transformative AI

    My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.

    December 11, 2022

    Causal Machine Learning is off to a good start, despite some issues

    Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.

    September 12, 2022

    The mission matters: Moving to climate tech as a data scientist

    Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.

    June 6, 2022

    Building useful machine learning tools keeps getting easier: A fish ID case study

    Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.

    March 20, 2022

    Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials

    Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.

    January 14, 2022

    Use your human brain to avoid artificial intelligence disasters

    Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.

    November 22, 2021

    Migrating from WordPress.com to Hugo on GitHub + Cloudflare

    My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.

    November 10, 2021

    My work with Automattic

    Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.

    October 7, 2021

    Some highlights from 2020

    Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.

    April 5, 2021

    Many is not enough: Counting simulations to bootstrap the right way

    Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.

    August 24, 2020

    Software commodities are eating interesting data science work

    Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?

    January 11, 2020

    A day in the life of a remote data scientist

    Video of a talk I gave on remote data science work at the Data Science Sydney meetup.

    December 11, 2019

    Bootstrapping the right way?

    Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.

    October 6, 2019

    Hackers beware: Bootstrap sampling may be harmful

    Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.

    January 7, 2019

    The most practical causal inference book I’ve read (is still a draft)

    Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.

    December 24, 2018

    Reflections on remote data science work

    Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.

    November 3, 2018

    Defining data science in 2018

    Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.

    July 22, 2018

    Advice for aspiring data scientists and other FAQs

    Frequently asked questions by visitors to this site, especially around entering the data science field.

    October 15, 2017

    State of Bandcamp Recommender, Late 2017

    Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.

    September 2, 2017

    My 10-step path to becoming a remote data scientist with Automattic

    I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.

    July 29, 2017

    Exploring and visualising Reef Life Survey data

    Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.

    June 3, 2017

    Customer lifetime value and the proliferation of misinformation on the internet

    There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.

    January 8, 2017

    Ask Why! Finding motives, causes, and purpose in data science

    Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.

    September 19, 2016

    If you don’t pay attention, data can drive you off a cliff

    Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.

    August 21, 2016

    Is Data Scientist a useless job title?

    It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.

    August 4, 2016

    Making Bayesian A/B testing more accessible

    A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.

    June 19, 2016

    Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions

    Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.

    May 14, 2016

    The rise of greedy robots

    Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.

    March 20, 2016

    Why you should stop worrying about deep learning and deepen your understanding of causality instead

    Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.

    February 14, 2016

    The joys of offline data collection

    Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.

    January 24, 2016

    This holiday season, give me real insights

    Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.

    December 8, 2015

    The hardest parts of data science

    Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.

    November 23, 2015

    Migrating a simple web application from MongoDB to Elasticsearch

    Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.

    November 4, 2015

    Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling

    Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.

    October 19, 2015

    The wonderful world of recommender systems

    Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.

    October 2, 2015

    You don’t need a data scientist (yet)

    Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.

    August 24, 2015

    Goodbye, Parse.com

    Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.

    July 31, 2015

    Learning about deep learning through album cover classification

    Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.

    July 6, 2015

    Hopping on the deep learning bandwagon

    To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.

    June 6, 2015

    First steps in data science: author-aware sentiment analysis

    I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.

    May 2, 2015

    My divestment from fossil fuels

    Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.

    April 24, 2015

    My PhD work

    An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.

    March 30, 2015

    The long road to a lifestyle business

    Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.

    March 22, 2015

    Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)

    My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).

    February 11, 2015

    Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)

    Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).

    January 29, 2015

    Automating Parse.com bulk data imports

    A script for importing data into the Parse backend-as-a-service.

    January 15, 2015

    Stochastic Gradient Boosting: Choosing the Best Number of Iterations

    Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.

    December 29, 2014

    SEO: Mostly about showing up?

    Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.

    December 15, 2014

    Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)

    Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.

    November 19, 2014

    BCRecommender Traction Update

    Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.

    November 5, 2014

    What is data science?

    Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.

    October 23, 2014

    Greek Media Monitoring Kaggle competition: My approach

    Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.

    October 7, 2014

    Applying the Traction Book’s Bullseye framework to BCRecommender

    Ranking 19 channels with the goal of getting traction for BCRecommender.

    September 24, 2014

    Bandcamp recommendation and discovery algorithms

    The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.

    September 19, 2014

    Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)

    Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.

    September 7, 2014

    Building a Bandcamp recommender system (part 1 – motivation)

    My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.

    August 30, 2014

    How to (almost) win Kaggle competitions

    Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.

    August 24, 2014

    Data’s hierarchy of needs

    Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.

    August 17, 2014

    Kaggle competition tips and summaries

    Pointers to all my Kaggle advice posts and competition summaries.

    April 5, 2014

    Kaggle beginner tips

    First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.

    January 19, 2014

    Is your tech stack ready for data-intensive applications?

    Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.

    June 24, 2024

    AI ain't gonna save you from bad data

    Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.

    June 17, 2024

    Startup data health starts with healthy event tracking

    Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.

    June 10, 2024

    How to avoid startups with poor development processes

    Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.

    June 3, 2024

    Plumbing, Decisions, and Automation: De-hyping Data & AI

    Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).

    May 27, 2024

    Question startup culture before accepting a data-to-AI role

    Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.

    May 20, 2024

    Probing the People aspects of an early-stage startup

    Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.

    May 13, 2024

    Business questions to ask before taking a startup data role

    Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.

    May 6, 2024

    Mentorship and the art of actionable advice

    Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.

    April 29, 2024

    Assessing a startup's data-to-AI health

    Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.

    April 22, 2024

    AI does not obviate the need for testing and observability

    It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.

    April 15, 2024

    My experience as a Data Tech Lead with Work on Climate

    The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.

    April 8, 2024

    Artificial intelligence, automation, and the art of counting fish

    Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.

    April 1, 2024

    Questions to consider when using AI for PDF data extraction

    Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.

    March 11, 2024

    Two types of startup data problems

    Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.

    March 4, 2024

    Avoiding AI complexity: First, write no code

    Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.

    February 26, 2024

    Building your startup's minimum viable data stack

    First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.

    February 19, 2024

    Nudging ChatGPT to invent books you have no time to read

    Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.

    February 12, 2024

    Substance over titles: Your first data hire may be a data scientist

    Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.

    February 5, 2024

    New decade, new tagline: Data & AI for Impact

    Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.

    January 19, 2024

    Supporting volunteer monitoring of marine biodiversity with modern web and data tools

    Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.

    November 29, 2023

    Lessons from reluctant data engineering

    Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.

    October 25, 2023

    My rediscovery of quiet writing on the open web

    Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.

    August 28, 2023

    Was data science a failure mode of software engineering?

    Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.

    June 30, 2023

    How hackable are automated coding assessments?

    Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.

    May 26, 2023

    Remaining relevant as a small language model

    Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).

    April 21, 2023

    ChatGPT is transformative AI

    My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.

    December 11, 2022

    Causal Machine Learning is off to a good start, despite some issues

    Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.

    September 12, 2022

    The mission matters: Moving to climate tech as a data scientist

    Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.

    June 6, 2022

    Building useful machine learning tools keeps getting easier: A fish ID case study

    Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.

    March 20, 2022

    Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trials

    Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.

    January 14, 2022

    Use your human brain to avoid artificial intelligence disasters

    Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.

    November 22, 2021

    Migrating from WordPress.com to Hugo on GitHub + Cloudflare

    My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.

    November 10, 2021

    My work with Automattic

    Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.

    October 7, 2021

    Some highlights from 2020

    Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.

    April 5, 2021

    Many is not enough: Counting simulations to bootstrap the right way

    Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.

    August 24, 2020

    Software commodities are eating interesting data science work

    Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?

    January 11, 2020

    A day in the life of a remote data scientist

    Video of a talk I gave on remote data science work at the Data Science Sydney meetup.

    December 11, 2019

    Bootstrapping the right way?

    Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.

    October 6, 2019

    Hackers beware: Bootstrap sampling may be harmful

    Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.

    January 7, 2019

    The most practical causal inference book I’ve read (is still a draft)

    Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.

    December 24, 2018

    Reflections on remote data science work

    Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.

    November 3, 2018

    Defining data science in 2018

    Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.

    July 22, 2018

    Advice for aspiring data scientists and other FAQs

    Frequently asked questions by visitors to this site, especially around entering the data science field.

    October 15, 2017

    State of Bandcamp Recommender, Late 2017

    Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.

    September 2, 2017

    My 10-step path to becoming a remote data scientist with Automattic

    I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.

    July 29, 2017

    Exploring and visualising Reef Life Survey data

    Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.

    June 3, 2017

    Customer lifetime value and the proliferation of misinformation on the internet

    There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.

    January 8, 2017

    Ask Why! Finding motives, causes, and purpose in data science

    Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.

    September 19, 2016

    If you don’t pay attention, data can drive you off a cliff

    Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.

    August 21, 2016

    Is Data Scientist a useless job title?

    It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.

    August 4, 2016

    Making Bayesian A/B testing more accessible

    A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.

    June 19, 2016

    Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptions

    Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.

    May 14, 2016

    The rise of greedy robots

    Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.

    March 20, 2016

    Why you should stop worrying about deep learning and deepen your understanding of causality instead

    Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.

    February 14, 2016

    The joys of offline data collection

    Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.

    January 24, 2016

    This holiday season, give me real insights

    Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.

    December 8, 2015

    The hardest parts of data science

    Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.

    November 23, 2015

    Migrating a simple web application from MongoDB to Elasticsearch

    Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.

    November 4, 2015

    Miscommunicating science: Simplistic models, nutritionism, and the art of storytelling

    Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.

    October 19, 2015

    The wonderful world of recommender systems

    Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.

    October 2, 2015

    You don’t need a data scientist (yet)

    Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.

    August 24, 2015

    Goodbye, Parse.com

    Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.

    July 31, 2015

    Learning about deep learning through album cover classification

    Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.

    July 6, 2015

    Hopping on the deep learning bandwagon

    To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.

    June 6, 2015

    First steps in data science: author-aware sentiment analysis

    I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.

    May 2, 2015

    My divestment from fossil fuels

    Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.

    April 24, 2015

    My PhD work

    An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.

    March 30, 2015

    The long road to a lifestyle business

    Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.

    March 22, 2015

    Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)

    My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).

    February 11, 2015

    Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)

    Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).

    January 29, 2015

    Automating Parse.com bulk data imports

    A script for importing data into the Parse backend-as-a-service.

    January 15, 2015

    Stochastic Gradient Boosting: Choosing the Best Number of Iterations

    Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.

    December 29, 2014

    SEO: Mostly about showing up?

    Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.

    December 15, 2014

    Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)

    Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.

    November 19, 2014

    BCRecommender Traction Update

    Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.

    November 5, 2014

    What is data science?

    Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.

    October 23, 2014

    Greek Media Monitoring Kaggle competition: My approach

    Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.

    October 7, 2014

    Applying the Traction Book’s Bullseye framework to BCRecommender

    Ranking 19 channels with the goal of getting traction for BCRecommender.

    September 24, 2014

    Bandcamp recommendation and discovery algorithms

    The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.

    September 19, 2014

    Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)

    Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.

    September 7, 2014

    Building a Bandcamp recommender system (part 1 – motivation)

    My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.

    August 30, 2014

    How to (almost) win Kaggle competitions

    Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.

    August 24, 2014

    Data’s hierarchy of needs

    Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.

    August 17, 2014

    Kaggle competition tips and summaries

    Pointers to all my Kaggle advice posts and competition summaries.

    April 5, 2014

    Kaggle beginner tips

    First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions.

    January 19, 2014
    \ No newline at end of file diff --git a/posts/index.xml b/posts/index.xml index 380969975..e8cf3148b 100644 --- a/posts/index.xml +++ b/posts/index.xml @@ -1 +1 @@ -Browse Posts on Yanir Seroussi | Data & AI for Startup Impacthttps://yanirseroussi.com/posts/Recent content in Browse Posts on Yanir Seroussi | Data & AI for Startup ImpactHugo -- gohugo.ioen-auText and figures licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by [Yanir Seroussi](https://yanirseroussi.com/about/), except where noted otherwiseAI ain't gonna save you from bad datahttps://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Mon, 17 Jun 2024 02:00:00 +0000https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.Startup data health starts with healthy event trackinghttps://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Mon, 10 Jun 2024 04:00:00 +0000https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.How to avoid startups with poor development processeshttps://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Mon, 03 Jun 2024 02:45:00 +0000https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.Plumbing, Decisions, and Automation: De-hyping Data & AIhttps://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Mon, 27 May 2024 02:00:00 +0000https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).Question startup culture before accepting a data-to-AI rolehttps://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Mon, 20 May 2024 02:25:00 +0000https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.Probing the People aspects of an early-stage startuphttps://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Mon, 13 May 2024 02:00:00 +0000https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.Business questions to ask before taking a startup data rolehttps://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Mon, 06 May 2024 04:30:00 +0000https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.Mentorship and the art of actionable advicehttps://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Mon, 29 Apr 2024 06:30:00 +0000https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.Assessing a startup's data-to-AI healthhttps://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Mon, 22 Apr 2024 06:00:00 +0000https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.AI does not obviate the need for testing and observabilityhttps://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/Mon, 15 Apr 2024 05:00:00 +0000https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.My experience as a Data Tech Lead with Work on Climatehttps://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/Mon, 08 Apr 2024 02:00:00 +0000https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.Artificial intelligence, automation, and the art of counting fishhttps://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Mon, 01 Apr 2024 06:00:00 +0000https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.Questions to consider when using AI for PDF data extractionhttps://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Mon, 11 Mar 2024 00:00:00 +0000https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.Two types of startup data problemshttps://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Mon, 04 Mar 2024 02:00:00 +0000https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.Avoiding AI complexity: First, write no codehttps://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Mon, 26 Feb 2024 01:45:00 +0000https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.Building your startup's minimum viable data stackhttps://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/Mon, 19 Feb 2024 00:00:00 +0000https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.Nudging ChatGPT to invent books you have no time to readhttps://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Mon, 12 Feb 2024 05:00:00 +0000https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.Substance over titles: Your first data hire may be a data scientisthttps://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Mon, 05 Feb 2024 02:45:00 +0000https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.New decade, new tagline: Data & AI for Impacthttps://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Fri, 19 Jan 2024 00:00:00 +0000https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.Supporting volunteer monitoring of marine biodiversity with modern web and data toolshttps://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Wed, 29 Nov 2023 02:00:00 +0000https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.Lessons from reluctant data engineeringhttps://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Wed, 25 Oct 2023 04:45:00 +0000https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.My rediscovery of quiet writing on the open webhttps://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Mon, 28 Aug 2023 05:30:00 +0000https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.Was data science a failure mode of software engineering?https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Fri, 30 Jun 2023 00:06:30 +0000https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.How hackable are automated coding assessments?https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Fri, 26 May 2023 00:03:00 +0000https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.Remaining relevant as a small language modelhttps://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Fri, 21 Apr 2023 00:06:30 +0000https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).ChatGPT is transformative AIhttps://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/Sun, 11 Dec 2022 00:00:00 +0000https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.Causal Machine Learning is off to a good start, despite some issueshttps://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Mon, 12 Sep 2022 02:45:00 +0000https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.The mission matters: Moving to climate tech as a data scientisthttps://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Mon, 06 Jun 2022 00:00:00 +0000https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.Building useful machine learning tools keeps getting easier: A fish ID case studyhttps://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Sun, 20 Mar 2022 04:30:00 +0000https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trialshttps://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Fri, 14 Jan 2022 00:05:40 +0000https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.Use your human brain to avoid artificial intelligence disastershttps://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Mon, 22 Nov 2021 03:45:00 +0000https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.Migrating from WordPress.com to Hugo on GitHub + Cloudflarehttps://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/Wed, 10 Nov 2021 06:30:00 +0000https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.My work with Automattichttps://yanirseroussi.com/2021/10/07/my-work-with-automattic/Thu, 07 Oct 2021 00:00:00 +0000https://yanirseroussi.com/2021/10/07/my-work-with-automattic/Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.Some highlights from 2020https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Mon, 05 Apr 2021 06:41:48 +0000https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.Many is not enough: Counting simulations to bootstrap the right wayhttps://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Mon, 24 Aug 2020 01:35:17 +0000https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.Software commodities are eating interesting data science workhttps://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Sat, 11 Jan 2020 09:22:35 +0000https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?A day in the life of a remote data scientisthttps://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Wed, 11 Dec 2019 22:06:19 +0000https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Video of a talk I gave on remote data science work at the Data Science Sydney meetup.Bootstrapping the right way?https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Sun, 06 Oct 2019 06:48:07 +0000https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.Hackers beware: Bootstrap sampling may be harmfulhttps://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Mon, 07 Jan 2019 21:07:56 +0000https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.The most practical causal inference book I’ve read (is still a draft)https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Mon, 24 Dec 2018 02:37:50 +0000https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.Reflections on remote data science workhttps://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Sat, 03 Nov 2018 06:33:13 +0000https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.Defining data science in 2018https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Sun, 22 Jul 2018 08:27:43 +0000https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.Advice for aspiring data scientists and other FAQshttps://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Sun, 15 Oct 2017 09:15:25 +0000https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Frequently asked questions by visitors to this site, especially around entering the data science field.State of Bandcamp Recommender, Late 2017https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Sat, 02 Sep 2017 10:19:02 +0000https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.My 10-step path to becoming a remote data scientist with Automattichttps://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/Sat, 29 Jul 2017 05:39:26 +0000https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.Exploring and visualising Reef Life Survey datahttps://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Sat, 03 Jun 2017 00:49:05 +0000https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.Customer lifetime value and the proliferation of misinformation on the internethttps://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/Sun, 08 Jan 2017 20:02:30 +0000https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.Ask Why! Finding motives, causes, and purpose in data sciencehttps://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Mon, 19 Sep 2016 21:28:44 +0000https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.If you don’t pay attention, data can drive you off a cliffhttps://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Sun, 21 Aug 2016 21:34:17 +0000https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.Is Data Scientist a useless job title?https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/Thu, 04 Aug 2016 22:26:03 +0000https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.Making Bayesian A/B testing more accessiblehttps://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/Sun, 19 Jun 2016 10:32:15 +0000https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptionshttps://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Sat, 14 May 2016 19:57:03 +0000https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.The rise of greedy robotshttps://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Sun, 20 Mar 2016 20:33:43 +0000https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.Why you should stop worrying about deep learning and deepen your understanding of causality insteadhttps://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Sun, 14 Feb 2016 11:04:11 +0000https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.The joys of offline data collectionhttps://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Sun, 24 Jan 2016 00:32:25 +0000https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.This holiday season, give me real insightshttps://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Tue, 08 Dec 2015 06:57:25 +0000https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.The hardest parts of data sciencehttps://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Mon, 23 Nov 2015 04:14:21 +0000https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.Migrating a simple web application from MongoDB to Elasticsearchhttps://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Wed, 04 Nov 2015 03:53:18 +0000https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.Miscommunicating science: Simplistic models, nutritionism, and the art of storytellinghttps://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Mon, 19 Oct 2015 00:02:32 +0000https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.The wonderful world of recommender systemshttps://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Fri, 02 Oct 2015 05:25:57 +0000https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.You don’t need a data scientist (yet)https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Mon, 24 Aug 2015 08:25:30 +0000https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.Goodbye, Parse.comhttps://yanirseroussi.com/2015/07/31/goodbye-parse-com/Fri, 31 Jul 2015 03:29:50 +0000https://yanirseroussi.com/2015/07/31/goodbye-parse-com/Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.Learning about deep learning through album cover classificationhttps://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Mon, 06 Jul 2015 22:21:42 +0000https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.Hopping on the deep learning bandwagonhttps://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/Sat, 06 Jun 2015 05:00:22 +0000https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.First steps in data science: author-aware sentiment analysishttps://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/Sat, 02 May 2015 08:31:10 +0000https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.My divestment from fossil fuelshttps://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/Fri, 24 Apr 2015 00:19:36 +0000https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.My PhD workhttps://yanirseroussi.com/phd-work/Mon, 30 Mar 2015 03:23:33 +0000https://yanirseroussi.com/phd-work/An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.The long road to a lifestyle businesshttps://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/Sun, 22 Mar 2015 09:43:47 +0000https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/Wed, 11 Feb 2015 06:34:17 +0000https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/Thu, 29 Jan 2015 10:37:39 +0000https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).Automating Parse.com bulk data importshttps://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/Thu, 15 Jan 2015 04:41:16 +0000https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/A script for importing data into the Parse backend-as-a-service.Stochastic Gradient Boosting: Choosing the Best Number of Iterationshttps://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/Mon, 29 Dec 2014 02:30:06 +0000https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.SEO: Mostly about showing up?https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/Mon, 15 Dec 2014 04:25:25 +0000https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/Wed, 19 Nov 2014 09:17:34 +0000https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.BCRecommender Traction Updatehttps://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/Wed, 05 Nov 2014 02:29:35 +0000https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.What is data science?https://yanirseroussi.com/2014/10/23/what-is-data-science/Thu, 23 Oct 2014 03:22:08 +0000https://yanirseroussi.com/2014/10/23/what-is-data-science/Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.Greek Media Monitoring Kaggle competition: My approachhttps://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/Tue, 07 Oct 2014 03:21:35 +0000https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.Applying the Traction Book’s Bullseye framework to BCRecommenderhttps://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/Wed, 24 Sep 2014 04:57:39 +0000https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/Ranking 19 channels with the goal of getting traction for BCRecommender.Bandcamp recommendation and discovery algorithmshttps://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/Fri, 19 Sep 2014 14:26:55 +0000https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/Sun, 07 Sep 2014 10:48:44 +0000https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.Building a Bandcamp recommender system (part 1 – motivation)https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/Sat, 30 Aug 2014 08:11:38 +0000https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.How to (almost) win Kaggle competitionshttps://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/Sun, 24 Aug 2014 12:40:53 +0000https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.Data’s hierarchy of needshttps://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/Sun, 17 Aug 2014 13:09:30 +0000https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.Kaggle competition tips and summarieshttps://yanirseroussi.com/kaggle/Sat, 05 Apr 2014 23:46:10 +0000https://yanirseroussi.com/kaggle/Pointers to all my Kaggle advice posts and competition summaries.Kaggle beginner tipshttps://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/Sun, 19 Jan 2014 10:34:28 +0000https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions. \ No newline at end of file +Browse Posts on Yanir Seroussi | Data & AI for Startup Impacthttps://yanirseroussi.com/posts/Recent content in Browse Posts on Yanir Seroussi | Data & AI for Startup ImpactHugo -- gohugo.ioen-auText and figures licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) by [Yanir Seroussi](https://yanirseroussi.com/about/), except where noted otherwiseIs your tech stack ready for data-intensive applications?https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/Mon, 24 Jun 2024 02:00:00 +0000https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/Questions to assess the quality of tech stacks and lifecycles, with a focus on artificial intelligence, machine learning, and analytics.AI ain't gonna save you from bad datahttps://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Mon, 17 Jun 2024 02:00:00 +0000https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/Since we’re far from a utopia where data issues are fully handled by AI, this post presents six questions humans can use to assess data projects.Startup data health starts with healthy event trackinghttps://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Mon, 10 Jun 2024 04:00:00 +0000https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.How to avoid startups with poor development processeshttps://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Mon, 03 Jun 2024 02:45:00 +0000https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/Questions that prospective data specialists and engineers should ask about development processes before accepting a startup role.Plumbing, Decisions, and Automation: De-hyping Data & AIhttps://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Mon, 27 May 2024 02:00:00 +0000https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/Three essential questions to understand where an organisation stands when it comes to Data & AI (with zero hype).Question startup culture before accepting a data-to-AI rolehttps://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Mon, 20 May 2024 02:25:00 +0000https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/Eight questions that prospective data-to-AI employees should ask about a startup’s work and data culture.Probing the People aspects of an early-stage startuphttps://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Mon, 13 May 2024 02:00:00 +0000https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/Ten questions that prospective employees should ask about a startup’s team, especially for data-centric roles.Business questions to ask before taking a startup data rolehttps://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Mon, 06 May 2024 04:30:00 +0000https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/Fourteen questions that prospective employees should ask about a startup’s business model and product, especially for data-focused roles.Mentorship and the art of actionable advicehttps://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Mon, 29 Apr 2024 06:30:00 +0000https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/Reflections on what it takes to package expertise and deliver timely, actionable advice outside the context of employee relationships.Assessing a startup's data-to-AI healthhttps://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Mon, 22 Apr 2024 06:00:00 +0000https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.AI does not obviate the need for testing and observabilityhttps://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/Mon, 15 Apr 2024 05:00:00 +0000https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/It’s easy to prototype with AI, but production-grade AI apps require even more thorough testing and observability than traditional software.My experience as a Data Tech Lead with Work on Climatehttps://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/Mon, 08 Apr 2024 02:00:00 +0000https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/The story of how I joined Work on Climate as a volunteer and became its data tech lead, with lessons applied to consulting & fractional work.Artificial intelligence, automation, and the art of counting fishhttps://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Mon, 01 Apr 2024 06:00:00 +0000https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/Discussing the use of AI to automate underwater marine surveys as an example of the uneven distribution of technological advancement.Questions to consider when using AI for PDF data extractionhttps://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Mon, 11 Mar 2024 00:00:00 +0000https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/Discussing considerations that arise when attempting to automate the extraction of structured data from PDFs and similar documents.Two types of startup data problemshttps://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Mon, 04 Mar 2024 02:00:00 +0000https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/Classifying startups as ML-centric or non-ML is a helpful exercise to uncover the data challenges they’re likely to face.Avoiding AI complexity: First, write no codehttps://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Mon, 26 Feb 2024 01:45:00 +0000https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/Two stories of getting AI functionality to production, which demonstrate the risks inherent in custom development versus starting with a no-code approach.Building your startup's minimum viable data stackhttps://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/Mon, 19 Feb 2024 00:00:00 +0000https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/First post in a series on building a minimum viable data stack for startups, introducing key definitions, components, and considerations.Nudging ChatGPT to invent books you have no time to readhttps://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Mon, 12 Feb 2024 05:00:00 +0000https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/Getting ChatGPT Plus to elaborate on possible book content and produce a PDF cheatsheet, with the goal of learning about its capabilities.Substance over titles: Your first data hire may be a data scientisthttps://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Mon, 05 Feb 2024 02:45:00 +0000https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.New decade, new tagline: Data & AI for Impacthttps://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Fri, 19 Jan 2024 00:00:00 +0000https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/Shifting focus to ‘Data & AI for Impact’, with more startup-related content, increased posting frequency, and deeper audience engagement.Supporting volunteer monitoring of marine biodiversity with modern web and data toolshttps://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Wed, 29 Nov 2023 02:00:00 +0000https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/Summarising the work Uri Seroussi and I did to improve Reef Life Survey’s Reef Species of the World app.Lessons from reluctant data engineeringhttps://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Wed, 25 Oct 2023 04:45:00 +0000https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/Video and summary of a talk I gave at DataEngBytes Brisbane on what I learned from doing data engineering as part of every data science role I had.My rediscovery of quiet writing on the open webhttps://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Mon, 28 Aug 2023 05:30:00 +0000https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/Reflections on publishing on this website: Writing publicly to share thoughts and documentation beats chasing views and likes.Was data science a failure mode of software engineering?https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Fri, 30 Jun 2023 00:06:30 +0000https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/Yes, data science projects have suffered from classic software engineering mistakes, but the field is maturing with the rise of new engineering roles.How hackable are automated coding assessments?https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Fri, 26 May 2023 00:03:00 +0000https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/Exploring the hackability of speed-based coding tests, using CodeSignal’s Industry Coding Framework as a case study.Remaining relevant as a small language modelhttps://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Fri, 21 Apr 2023 00:06:30 +0000https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/Bing Chat recently quipped that humans are small language models. Here are some of my thoughts on how we small language models can remain relevant (for now).ChatGPT is transformative AIhttps://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/Sun, 11 Dec 2022 00:00:00 +0000https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/My perspective after a week of using ChatGPT: This is a step change in finding distilled information, and it’s only the beginning.Causal Machine Learning is off to a good start, despite some issueshttps://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Mon, 12 Sep 2022 02:45:00 +0000https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/Reviewing the first three chapters of the book Causal Machine Learning by Robert Osazuwa Ness.The mission matters: Moving to climate tech as a data scientisthttps://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Mon, 06 Jun 2022 00:00:00 +0000https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/Discussing my recent career move into climate tech as a way of doing more to help mitigate dangerous climate change.Building useful machine learning tools keeps getting easier: A fish ID case studyhttps://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Sun, 20 Mar 2022 04:30:00 +0000https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/Lessons learned building a fish ID web app with fast.ai and Streamlit, in an attempt to reduce my fear of missing out on the latest deep learning developments.Analysis strategies in online A/B experiments: Intention-to-treat, per-protocol, and other lessons from clinical trialshttps://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Fri, 14 Jan 2022 00:05:40 +0000https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/Epidemiologists analyse clinical trials to estimate the intention-to-treat and per-protocol effects. This post applies their strategies to online experiments.Use your human brain to avoid artificial intelligence disastershttps://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Mon, 22 Nov 2021 03:45:00 +0000https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/Overview of a talk I gave at a deep learning course, focusing on AI ethics as the need for humans to think on the context and consequences of applying AI.Migrating from WordPress.com to Hugo on GitHub + Cloudflarehttps://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/Wed, 10 Nov 2021 06:30:00 +0000https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/My reasons for switching from WordPress.com to Hugo on GitHub + Cloudflare, along with a summary of the solution components and migration process.My work with Automattichttps://yanirseroussi.com/2021/10/07/my-work-with-automattic/Thu, 07 Oct 2021 00:00:00 +0000https://yanirseroussi.com/2021/10/07/my-work-with-automattic/Back-dated meta-post that gathers my posts on Automattic blogs into a summary of the work I’ve done with the company.Some highlights from 2020https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Mon, 05 Apr 2021 06:41:48 +0000https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/Sharing remote teamwork insights, my climate & sustainability activism, Reef Life Survey publications, and progress on Automattic’s Experimentation Platform.Many is not enough: Counting simulations to bootstrap the right wayhttps://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Mon, 24 Aug 2020 01:35:17 +0000https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/Going deeper into correct testing of different methods for bootstrap estimation of confidence intervals.Software commodities are eating interesting data science workhttps://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Sat, 11 Jan 2020 09:22:35 +0000https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/Being a data scientist can sometimes feel like a race against software commodities that replace interesting work. What can one do to remain relevant?A day in the life of a remote data scientisthttps://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Wed, 11 Dec 2019 22:06:19 +0000https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/Video of a talk I gave on remote data science work at the Data Science Sydney meetup.Bootstrapping the right way?https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Sun, 06 Oct 2019 06:48:07 +0000https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.Hackers beware: Bootstrap sampling may be harmfulhttps://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Mon, 07 Jan 2019 21:07:56 +0000https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/Bootstrap sampling has been promoted as an easy way of modelling uncertainty to hackers without much statistical knowledge. But things aren’t that simple.The most practical causal inference book I’ve read (is still a draft)https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Mon, 24 Dec 2018 02:37:50 +0000https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/Causal Inference by Miguel Hernán and Jamie Robins is a must-read for anyone interested in the area.Reflections on remote data science workhttps://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Sat, 03 Nov 2018 06:33:13 +0000https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/Discussing the pluses and minuses of remote work eighteen months after joining Automattic as a data scientist.Defining data science in 2018https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Sun, 22 Jul 2018 08:27:43 +0000https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.Advice for aspiring data scientists and other FAQshttps://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Sun, 15 Oct 2017 09:15:25 +0000https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/Frequently asked questions by visitors to this site, especially around entering the data science field.State of Bandcamp Recommender, Late 2017https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Sat, 02 Sep 2017 10:19:02 +0000https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/Call for BCRecommender maintainers followed by a decision to shut it down, as I don’t have enough time and Bandcamp now offers recommendations.My 10-step path to becoming a remote data scientist with Automattichttps://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/Sat, 29 Jul 2017 05:39:26 +0000https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/I wanted a well-paid data science-y remote job with an established company that offers a good life balance and makes products I care about. I got it eventually.Exploring and visualising Reef Life Survey datahttps://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Sat, 03 Jun 2017 00:49:05 +0000https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/Web tools I built to visualise Reef Life Survey data and assist citizen scientists in underwater visual census work.Customer lifetime value and the proliferation of misinformation on the internethttps://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/Sun, 08 Jan 2017 20:02:30 +0000https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.Ask Why! Finding motives, causes, and purpose in data sciencehttps://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Mon, 19 Sep 2016 21:28:44 +0000https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/Video and summary of a talk I gave at the Data Science Sydney meetup, about going beyond the what & how of predictive modelling.If you don’t pay attention, data can drive you off a cliffhttps://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Sun, 21 Aug 2016 21:34:17 +0000https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.Is Data Scientist a useless job title?https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/Thu, 04 Aug 2016 22:26:03 +0000https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/It seems like anyone who touches data can call themselves a data scientist, which makes the title useless. The work they do can still be useful, though.Making Bayesian A/B testing more accessiblehttps://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/Sun, 19 Jun 2016 10:32:15 +0000https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.Diving deeper into causality: Pearl, Kleinberg, Hill, and untested assumptionshttps://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Sat, 14 May 2016 19:57:03 +0000https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/Discussing the need for untested assumptions and temporality in causal inference. Mostly based on Samantha Kleinberg’s Causality, Probability, and Time.The rise of greedy robotshttps://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Sun, 20 Mar 2016 20:33:43 +0000https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/Is artificial/machine intelligence a future threat? I argue that it’s already here, with greedy robots already dominating our lives.Why you should stop worrying about deep learning and deepen your understanding of causality insteadhttps://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Sun, 14 Feb 2016 11:04:11 +0000https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.The joys of offline data collectionhttps://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Sun, 24 Jan 2016 00:32:25 +0000https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/Insights on data collection and machine learning from spending a month sailing, diving, and counting fish with Reef Life Survey.This holiday season, give me real insightshttps://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Tue, 08 Dec 2015 06:57:25 +0000https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.The hardest parts of data sciencehttps://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Mon, 23 Nov 2015 04:14:21 +0000https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/Defining feasible problems and coming up with reasonable ways of measuring solutions is harder than building accurate models or obtaining clean data.Migrating a simple web application from MongoDB to Elasticsearchhttps://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Wed, 04 Nov 2015 03:53:18 +0000https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/Migrating BCRecommender from MongoDB to Elasticsearch made it possible to offer a richer search experience to users at a similar cost, among other benefits.Miscommunicating science: Simplistic models, nutritionism, and the art of storytellinghttps://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Mon, 19 Oct 2015 00:02:32 +0000https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/Nutritionism is a special case of misinterpretation and miscommunication of scientific results – something many data scientists encounter in their work.The wonderful world of recommender systemshttps://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Fri, 02 Oct 2015 05:25:57 +0000https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/Giving an overview of the field and common paradigms, and debunking five common myths about recommender systems.You don’t need a data scientist (yet)https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Mon, 24 Aug 2015 08:25:30 +0000https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/Hiring data scientists prematurely is wasteful and frustrating. Here are some questions to ask before you hire your first data scientist.Goodbye, Parse.comhttps://yanirseroussi.com/2015/07/31/goodbye-parse-com/Fri, 31 Jul 2015 03:29:50 +0000https://yanirseroussi.com/2015/07/31/goodbye-parse-com/Migrating my web apps away from Parse.com due to reliability issues. Self-hosting is a better solution.Learning about deep learning through album cover classificationhttps://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Mon, 06 Jul 2015 22:21:42 +0000https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/Progress on my album cover classification project, highlighting lessons that would be useful to others who are getting started with deep learning.Hopping on the deep learning bandwagonhttps://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/Sat, 06 Jun 2015 05:00:22 +0000https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/To become proficient at solving data science problems, you need to get your hands dirty. Here, I used album cover classification to learn about deep learning.First steps in data science: author-aware sentiment analysishttps://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/Sat, 02 May 2015 08:31:10 +0000https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/I became a data scientist by doing a PhD, but the same steps can be followed without a formal education program.My divestment from fossil fuelshttps://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/Fri, 24 Apr 2015 00:19:36 +0000https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/Recent choices I’ve made to reduce my exposure to fossil fuels, including practical steps that can be taken by Australians and generally applicable lessons.My PhD workhttps://yanirseroussi.com/phd-work/Mon, 30 Mar 2015 03:23:33 +0000https://yanirseroussi.com/phd-work/An overview of my PhD in data science / artificial intelligence. Thesis title: Text Mining and Rating Prediction with Topical User Models.The long road to a lifestyle businesshttps://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/Sun, 22 Mar 2015 09:43:47 +0000https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/Progress since leaving my last full-time job and setting on an independent path that includes data science consulting and work on my own projects.Learning to rank for personalised search (Yandex Search Personalisation – Kaggle Competition Summary – Part 2)https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/Wed, 11 Feb 2015 06:34:17 +0000https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/My team’s solution to the Yandex Search Personalisation competition (finished 9th out of 194 teams).Is thinking like a search engine possible? (Yandex search personalisation – Kaggle competition summary – part 1)https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/Thu, 29 Jan 2015 10:37:39 +0000https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/Insights on search personalisation and SEO from participating in a Kaggle competition (finished 9th out of 194 teams).Automating Parse.com bulk data importshttps://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/Thu, 15 Jan 2015 04:41:16 +0000https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/A script for importing data into the Parse backend-as-a-service.Stochastic Gradient Boosting: Choosing the Best Number of Iterationshttps://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/Mon, 29 Dec 2014 02:30:06 +0000https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/Exploring an approach to choosing the optimal number of iterations in stochastic gradient boosting, following a bug I found in scikit-learn.SEO: Mostly about showing up?https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/Mon, 15 Dec 2014 04:25:25 +0000https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/Increasing SEO traffic to BCRecommender by adding content and opening up more pages for crawling. It turns out that thin content is better than no content.Fitting noise: Forecasting the sale price of bulldozers (Kaggle competition summary)https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/Wed, 19 Nov 2014 09:17:34 +0000https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/Summary of a Kaggle competition to forecast bulldozer sale price, where I finished 9th out of 476 teams.BCRecommender Traction Updatehttps://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/Wed, 05 Nov 2014 02:29:35 +0000https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/Update on BCRecommender traction using three channels: blogger outreach, search engine optimisation, and content marketing.What is data science?https://yanirseroussi.com/2014/10/23/what-is-data-science/Thu, 23 Oct 2014 03:22:08 +0000https://yanirseroussi.com/2014/10/23/what-is-data-science/Data science has been a hot term in the past few years. Still, there isn’t a single definition of the field. This post discusses my favourite definition.Greek Media Monitoring Kaggle competition: My approachhttps://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/Tue, 07 Oct 2014 03:21:35 +0000https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/Summary of my approach to the Greek Media Monitoring Kaggle competition, where I finished 6th out of 120 teams.Applying the Traction Book’s Bullseye framework to BCRecommenderhttps://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/Wed, 24 Sep 2014 04:57:39 +0000https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/Ranking 19 channels with the goal of getting traction for BCRecommender.Bandcamp recommendation and discovery algorithmshttps://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/Fri, 19 Sep 2014 14:26:55 +0000https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/The recommendation backend for my BCRecommender service for personalised Bandcamp music discovery.Building a recommender system on a shoestring budget (or: BCRecommender part 2 – general system layout)https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/Sun, 07 Sep 2014 10:48:44 +0000https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/Iterating on my BCRecommender service with the goal of keeping costs low while providing a valuable music recommendation service.Building a Bandcamp recommender system (part 1 – motivation)https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/Sat, 30 Aug 2014 08:11:38 +0000https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/My motivation behind building BCRecommender, a free recommendation & discovery service for Bandcamp music.How to (almost) win Kaggle competitionshttps://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/Sun, 24 Aug 2014 12:40:53 +0000https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/Summary of a talk I gave at the Data Science Sydney meetup with ten tips on almost-winning Kaggle competitions.Data’s hierarchy of needshttps://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/Sun, 17 Aug 2014 13:09:30 +0000https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/Discussing the hierarchy of needs proposed by Jay Kreps. Key takeaway: Data-driven algorithms & insights can only be as good as the underlying data.Kaggle competition tips and summarieshttps://yanirseroussi.com/kaggle/Sat, 05 Apr 2014 23:46:10 +0000https://yanirseroussi.com/kaggle/Pointers to all my Kaggle advice posts and competition summaries.Kaggle beginner tipshttps://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/Sun, 19 Jan 2014 10:34:28 +0000https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/First post! An email I sent to members of the Data Science Sydney Meetup with tips on how to get started with Kaggle competitions. \ No newline at end of file diff --git a/sitemap.xml b/sitemap.xml index 4ce817594..18adc8659 100644 --- a/sitemap.xml +++ b/sitemap.xml @@ -1 +1 @@ -https://yanirseroussi.com/tags/artificial-intelligence/2024-06-23T08:52:50+10:00https://yanirseroussi.com/tags/data-strategy/2024-06-23T08:52:50+10:00https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/2024-06-23T08:52:50+10:00https://yanirseroussi.com/tags/devops/2024-06-23T08:52:50+10:00https://yanirseroussi.com/tags/machine-learning/2024-06-23T08:52:50+10:00https://yanirseroussi.com/tags/quotes/2024-06-23T08:52:50+10:00https://yanirseroussi.com/tags/software-engineering/2024-06-23T08:52:50+10:00https://yanirseroussi.com/tags/2024-06-23T08:52:50+10:00https://yanirseroussi.com/2024-06-23T08:52:50+10:00https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/2024-06-17T13:13:44+10:00https://yanirseroussi.com/tags/data-science/2024-06-17T13:13:44+10:00https://yanirseroussi.com/tags/startups/2024-06-17T13:13:44+10:00https://yanirseroussi.com/tags/books/2024-06-12T12:58:06+10:00https://yanirseroussi.com/tags/business/2024-06-12T12:58:06+10:00https://yanirseroussi.com/tags/career/2024-06-19T17:03:21+10:00https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/2024-06-12T12:58:06+10:00https://yanirseroussi.com/tags/analytics/2024-06-10T14:23:12+10:00https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/2024-06-10T14:23:12+10:00https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/2024-06-03T12:58:00+10:00https://yanirseroussi.com/tags/data-engineering/2024-05-27T12:25:30+10:00https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/2024-05-27T12:25:30+10:00https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/2024-05-25T10:00:56+10:00https://yanirseroussi.com/tags/futurism/2024-05-25T10:00:56+10:00https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/2024-05-21T17:08:32+10:00https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/2024-05-13T12:41:01+10:00https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/2024-05-06T14:41:43+10:00https://yanirseroussi.com/tags/consulting/2024-04-29T17:25:28+10:00https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/2024-04-29T17:25:28+10:00https://yanirseroussi.com/tags/personal/2024-04-29T17:25:28+10:00https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/2024-04-22T17:38:21+10:00https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/2024-04-15T15:54:17+10:00https://yanirseroussi.com/tags/linkedin/2024-04-11T13:42:58+10:00https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/2024-04-11T13:42:58+10:00https://yanirseroussi.com/tags/marketing/2024-04-11T13:42:58+10:00https://yanirseroussi.com/tags/climate-change/2024-04-08T12:13:47+10:00https://yanirseroussi.com/tags/environment/2024-04-08T12:13:47+10:00https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/2024-04-08T12:13:47+10:00https://yanirseroussi.com/tags/remote-work/2024-04-08T12:13:47+10:00https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/2024-04-05T11:23:38+10:00https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/2024-04-01T17:02:44+10:00https://yanirseroussi.com/tags/marine-science/2024-04-01T17:02:44+10:00https://yanirseroussi.com/tags/reef-life-survey/2024-04-01T17:02:44+10:00https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/2024-03-12T16:33:48+10:00https://yanirseroussi.com/tags/productivity/2024-03-12T16:33:48+10:00https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/2024-03-11T15:53:13+10:00https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/2024-03-05T08:47:19+10:00https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/2024-03-04T12:39:10+10:00https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/2024-02-19T11:25:54+10:00https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/2024-02-17T12:34:00+10:00https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/2024-02-13T08:24:54+10:00https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/2024-02-06T16:39:35+10:00https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/2024-02-19T11:25:54+10:00https://yanirseroussi.com/tags/blogging/2024-01-19T16:35:09+10:00https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/2024-01-19T16:35:09+10:00https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/2024-01-09T13:23:28+10:00https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/2024-01-08T16:31:22+10:00https://yanirseroussi.com/tags/data-business/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/2023-12-18T10:38:56+10:00https://yanirseroussi.com/tags/energy-markets/2023-12-14T10:46:41+10:00https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/2023-12-14T10:46:41+10:00https://yanirseroussi.com/tags/data-visualisation/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/web-development/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/2024-03-12T16:33:31+10:00https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/2023-11-21T16:12:27+10:00https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/2023-10-06T15:11:27+10:00https://yanirseroussi.com/tags/ethics/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/2023-09-25T11:15:26+10:00https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/2023-09-22T07:54:13+10:00https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/2024-03-12T16:33:31+10:00https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/2024-03-12T16:33:31+10:00https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/2024-03-12T16:33:31+10:00https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/2023-08-14T15:44:21+10:00https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/2023-08-11T14:35:20+10:00https://yanirseroussi.com/tags/github/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/security/2023-07-25T09:30:43+10:00https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/2023-07-25T09:30:43+10:00https://yanirseroussi.com/tags/hugo/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/2023-07-17T17:18:06+10:00https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/2024-03-12T16:33:31+10:00https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/hackers/2024-06-19T17:03:21+10:00https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/2024-06-19T17:03:21+10:00https://yanirseroussi.com/tags/machine-intelligence/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/causal-inference/2024-02-21T11:52:55+10:00https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/automattic/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/orkestra/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/politics/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/sustainability/2024-02-21T11:52:55+10:00https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/deep-learning/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/fast.ai/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/2024-02-21T11:52:55+10:00https://yanirseroussi.com/tags/split-testing/2024-02-21T11:52:55+10:00https://yanirseroussi.com/tags/statistics/2024-05-06T16:35:22+10:00https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/cloudflare/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/wordpress/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2021/10/07/my-work-with-automattic/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/2024-02-21T11:52:55+10:00https://yanirseroussi.com/tags/bootstrapping/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/confidence-intervals/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/2024-05-06T16:35:22+10:00https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/frequently-asked-questions/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/bandcamp/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/bcrecommender/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/elasticsearch/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/javascript/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/predictive-modelling/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/science-communication/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/search-engine-optimisation/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/insights/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/2024-02-21T11:52:55+10:00https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/economics/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/scuba-diving/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/facebook/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/kaggle/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/mongodb/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/health/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/nutrition/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/nutritionism/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/recommender-systems/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/07/31/goodbye-parse-com/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/parse.com/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/2024-01-16T09:56:03+10:00https://yanirseroussi.com/deep-learning-resources/2021-11-09T15:38:25+10:00https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/sentiment-analysis/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/divestment/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/fossil-fuels/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/2024-01-16T09:56:03+10:00https://yanirseroussi.com/phd-work/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/gradient-boosting/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/kaggle-competition/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/phantomjs/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/scikit-learn/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/2023-07-06T09:28:02+10:00https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/traction-book/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/price-forecasting/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/music/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/10/23/what-is-data-science/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/multi-label-classification/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/2023-07-06T09:28:02+10:00https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/2023-07-06T09:28:02+10:00https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/music-industry/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/2023-07-06T09:28:02+10:00https://yanirseroussi.com/tags/kaggle-beginners/2023-07-06T09:28:02+10:00https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/2024-01-16T09:56:03+10:00https://yanirseroussi.com/kaggle/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/2023-07-06T09:28:02+10:00https://yanirseroussi.com/about/2024-03-08T11:21:16+10:00https://yanirseroussi.com/free-intro-call/2024-05-22T17:54:36+10:00https://yanirseroussi.com/posts/2024-05-09T10:03:31+10:00https://yanirseroussi.com/causal-inference-resources/2023-07-06T16:01:57+10:00https://yanirseroussi.com/consult/2024-05-23T15:31:11+10:00https://yanirseroussi.com/data-to-ai-health-check/2024-05-22T17:53:56+10:00https://yanirseroussi.com/contact/2024-05-23T15:31:11+10:00https://yanirseroussi.com/talks/2024-05-06T16:35:22+10:00https://yanirseroussi.com/til/2024-05-09T10:03:31+10:00 \ No newline at end of file +https://yanirseroussi.com/tags/analytics/2024-06-24T14:12:50+10:00https://yanirseroussi.com/tags/artificial-intelligence/2024-06-24T14:12:50+10:00https://yanirseroussi.com/tags/data-science/2024-06-24T14:12:50+10:00https://yanirseroussi.com/tags/data-strategy/2024-06-24T14:12:50+10:00https://yanirseroussi.com/2024/06/24/is-your-tech-stack-ready-for-data-intensive-applications/2024-06-24T14:12:50+10:00https://yanirseroussi.com/tags/machine-learning/2024-06-24T14:12:50+10:00https://yanirseroussi.com/tags/software-engineering/2024-06-24T14:12:50+10:00https://yanirseroussi.com/tags/startups/2024-06-24T14:12:50+10:00https://yanirseroussi.com/tags/2024-06-24T14:12:50+10:00https://yanirseroussi.com/2024-06-24T14:12:50+10:00https://yanirseroussi.com/til/2024/06/22/dealing-with-endless-data-changes/2024-06-23T08:52:50+10:00https://yanirseroussi.com/tags/devops/2024-06-23T08:52:50+10:00https://yanirseroussi.com/tags/quotes/2024-06-23T08:52:50+10:00https://yanirseroussi.com/2024/06/17/ai-aint-gonna-save-you-from-bad-data/2024-06-17T13:13:44+10:00https://yanirseroussi.com/tags/books/2024-06-12T12:58:06+10:00https://yanirseroussi.com/tags/business/2024-06-12T12:58:06+10:00https://yanirseroussi.com/tags/career/2024-06-19T17:03:21+10:00https://yanirseroussi.com/til/2024/06/12/the-rules-of-the-passion-economy/2024-06-12T12:58:06+10:00https://yanirseroussi.com/2024/06/10/startup-data-health-starts-with-healthy-event-tracking/2024-06-10T14:23:12+10:00https://yanirseroussi.com/2024/06/03/how-to-avoid-startups-with-poor-development-processes/2024-06-03T12:58:00+10:00https://yanirseroussi.com/tags/data-engineering/2024-05-27T12:25:30+10:00https://yanirseroussi.com/2024/05/27/plumbing-decisions-and-automation-de-hyping-data-and-ai/2024-05-27T12:25:30+10:00https://yanirseroussi.com/til/2024/05/25/adapting-to-the-economy-of-algorithms/2024-05-25T10:00:56+10:00https://yanirseroussi.com/tags/futurism/2024-05-25T10:00:56+10:00https://yanirseroussi.com/2024/05/20/question-startup-culture-before-accepting-a-data-to-ai-role/2024-05-21T17:08:32+10:00https://yanirseroussi.com/2024/05/13/probing-the-people-aspects-of-an-early-stage-startup/2024-05-13T12:41:01+10:00https://yanirseroussi.com/2024/05/06/business-questions-to-ask-before-taking-a-startup-data-role/2024-05-06T14:41:43+10:00https://yanirseroussi.com/tags/consulting/2024-04-29T17:25:28+10:00https://yanirseroussi.com/2024/04/29/mentorship-and-the-art-of-actionable-advice/2024-04-29T17:25:28+10:00https://yanirseroussi.com/tags/personal/2024-04-29T17:25:28+10:00https://yanirseroussi.com/2024/04/22/assessing-a-startups-data-to-ai-health/2024-04-22T17:38:21+10:00https://yanirseroussi.com/2024/04/15/ai-does-not-obviate-the-need-for-testing-and-observability/2024-04-15T15:54:17+10:00https://yanirseroussi.com/tags/linkedin/2024-04-11T13:42:58+10:00https://yanirseroussi.com/til/2024/04/11/linkedin-is-a-teachable-skill/2024-04-11T13:42:58+10:00https://yanirseroussi.com/tags/marketing/2024-04-11T13:42:58+10:00https://yanirseroussi.com/tags/climate-change/2024-04-08T12:13:47+10:00https://yanirseroussi.com/tags/environment/2024-04-08T12:13:47+10:00https://yanirseroussi.com/2024/04/08/my-experience-as-a-data-tech-lead-with-work-on-climate/2024-04-08T12:13:47+10:00https://yanirseroussi.com/tags/remote-work/2024-04-08T12:13:47+10:00https://yanirseroussi.com/til/2024/04/05/the-data-engineering-lifecycle-is-not-going-anywhere/2024-04-05T11:23:38+10:00https://yanirseroussi.com/2024/04/01/artificial-intelligence-automation-and-the-art-of-counting-fish/2024-04-01T17:02:44+10:00https://yanirseroussi.com/tags/marine-science/2024-04-01T17:02:44+10:00https://yanirseroussi.com/tags/reef-life-survey/2024-04-01T17:02:44+10:00https://yanirseroussi.com/til/2024/03/12/atomic-habits-is-full-of-actionable-advice/2024-03-12T16:33:48+10:00https://yanirseroussi.com/tags/productivity/2024-03-12T16:33:48+10:00https://yanirseroussi.com/2024/03/11/questions-to-consider-when-using-ai-for-pdf-data-extraction/2024-03-11T15:53:13+10:00https://yanirseroussi.com/2024/03/04/two-types-of-startup-data-problems/2024-03-05T08:47:19+10:00https://yanirseroussi.com/2024/02/26/avoiding-ai-complexity-first-write-no-code/2024-03-04T12:39:10+10:00https://yanirseroussi.com/2024/02/19/building-your-startups-minimum-viable-data-stack/2024-02-19T11:25:54+10:00https://yanirseroussi.com/til/2024/02/17/the-three-cs-of-indie-consulting-confidence-cash-and-connections/2024-02-17T12:34:00+10:00https://yanirseroussi.com/2024/02/12/nudging-chatgpt-to-invent-books-you-have-no-time-to-read/2024-02-13T08:24:54+10:00https://yanirseroussi.com/til/2024/02/06/future-software-development-may-require-fewer-humans/2024-02-06T16:39:35+10:00https://yanirseroussi.com/2024/02/05/substance-over-titles-your-first-data-hire-may-be-a-data-scientist/2024-02-19T11:25:54+10:00https://yanirseroussi.com/tags/blogging/2024-01-19T16:35:09+10:00https://yanirseroussi.com/2024/01/19/new-decade-new-tagline-data-and-ai-for-impact/2024-01-19T16:35:09+10:00https://yanirseroussi.com/til/2024/01/09/psychographic-specialisations-may-work-for-discipline-generalists/2024-01-09T13:23:28+10:00https://yanirseroussi.com/til/2024/01/08/the-power-of-parasocial-relationships/2024-01-08T16:31:22+10:00https://yanirseroussi.com/tags/data-business/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/12/18/positioning-is-a-common-problem-for-data-scientists/2023-12-18T10:38:56+10:00https://yanirseroussi.com/tags/energy-markets/2023-12-14T10:46:41+10:00https://yanirseroussi.com/til/2023/12/14/transfer-learning-applies-to-energy-market-bidding/2023-12-14T10:46:41+10:00https://yanirseroussi.com/tags/data-visualisation/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2023/11/29/supporting-volunteer-monitoring-of-marine-biodiversity-with-modern-web-and-data-tools/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/web-development/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/11/28/our-blue-machine-is-changing-but-we-are-not-helpless/2024-03-12T16:33:31+10:00https://yanirseroussi.com/til/2023/11/21/you-dont-need-a-proprietary-api-for-static-maps/2023-11-21T16:12:27+10:00https://yanirseroussi.com/2023/10/25/lessons-from-reluctant-data-engineering/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/10/06/artificial-intelligence-was-a-marketing-term-all-along-just-call-it-automation/2023-10-06T15:11:27+10:00https://yanirseroussi.com/tags/ethics/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/09/25/the-lines-between-solo-consulting-and-product-building-are-blurry/2023-09-25T11:15:26+10:00https://yanirseroussi.com/til/2023/09/21/googles-rules-of-machine-learning-still-apply-in-the-age-of-large-language-models/2023-09-22T07:54:13+10:00https://yanirseroussi.com/2023/08/28/my-rediscovery-of-quiet-writing-on-the-open-web/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/08/21/the-minimalist-entrepreneur-is-too-prescriptive-for-me/2024-03-12T16:33:31+10:00https://yanirseroussi.com/til/2023/08/17/revisiting-start-small-stay-small-in-2023-chapter-2/2024-03-12T16:33:31+10:00https://yanirseroussi.com/til/2023/08/16/revisiting-start-small-stay-small-in-2023-chapter-1/2024-03-12T16:33:31+10:00https://yanirseroussi.com/til/2023/08/14/email-notifications-on-public-github-commits/2023-08-14T15:44:21+10:00https://yanirseroussi.com/til/2023/08/11/the-rule-of-thirds-can-probably-be-ignored/2023-08-11T14:35:20+10:00https://yanirseroussi.com/tags/github/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/security/2023-07-25T09:30:43+10:00https://yanirseroussi.com/til/2023/07/23/using-yubikey-for-ssh-access/2023-07-25T09:30:43+10:00https://yanirseroussi.com/tags/hugo/2024-01-16T09:56:03+10:00https://yanirseroussi.com/til/2023/07/17/making-a-til-section-with-hugo-and-papermod/2023-07-17T17:18:06+10:00https://yanirseroussi.com/til/2023/07/11/you-cant-save-time/2024-03-12T16:33:31+10:00https://yanirseroussi.com/2023/06/30/was-data-science-a-failure-mode-of-software-engineering/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/hackers/2024-06-19T17:03:21+10:00https://yanirseroussi.com/2023/05/26/how-hackable-are-automated-coding-assessments/2024-06-19T17:03:21+10:00https://yanirseroussi.com/tags/machine-intelligence/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2023/04/21/remaining-relevant-as-a-small-language-model/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2022/12/11/chatgpt-is-transformative-ai/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/causal-inference/2024-02-21T11:52:55+10:00https://yanirseroussi.com/2022/09/12/causal-machine-learning-book-draft-review/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/automattic/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/orkestra/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/politics/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/sustainability/2024-02-21T11:52:55+10:00https://yanirseroussi.com/2022/06/06/the-mission-matters-moving-to-climate-tech-as-a-data-scientist/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2022/03/20/building-useful-machine-learning-tools-keeps-getting-easier-a-fish-id-case-study/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/deep-learning/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/fast.ai/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2022/01/14/analysis-strategies-in-online-a-b-experiments/2024-02-21T11:52:55+10:00https://yanirseroussi.com/tags/split-testing/2024-02-21T11:52:55+10:00https://yanirseroussi.com/tags/statistics/2024-05-06T16:35:22+10:00https://yanirseroussi.com/2021/11/22/use-your-human-brain-to-avoid-artificial-intelligence-disasters/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/cloudflare/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2021/11/10/migrating-from-wordpress-com-to-hugo-on-github-cloudflare/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/wordpress/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2021/10/07/my-work-with-automattic/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2021/04/05/some-highlights-from-2020/2024-02-21T11:52:55+10:00https://yanirseroussi.com/tags/bootstrapping/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/confidence-intervals/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2020/08/24/many-is-not-enough-counting-simulations-to-bootstrap-the-right-way/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2020/01/11/software-commodities-are-eating-interesting-data-science-work/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2019/12/12/a-day-in-the-life-of-a-remote-data-scientist/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2019/10/06/bootstrapping-the-right-way/2024-05-06T16:35:22+10:00https://yanirseroussi.com/2019/01/08/hackers-beware-bootstrap-sampling-may-be-harmful/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2018/12/24/the-most-practical-causal-inference-book-ive-read-is-still-a-draft/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2018/11/03/reflections-on-remote-data-science-work/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2018/07/22/defining-data-science-in-2018/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/10/15/advice-for-aspiring-data-scientists-and-other-faqs/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/frequently-asked-questions/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/bandcamp/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/bcrecommender/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/09/02/state-of-bandcamp-recommender/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/elasticsearch/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/07/29/my-10-step-path-to-becoming-a-remote-data-scientist-with-automattic/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/06/03/exploring-and-visualising-reef-life-survey-data/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/javascript/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2017/01/08/customer-lifetime-value-and-the-proliferation-of-misinformation-on-the-internet/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/predictive-modelling/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/science-communication/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/search-engine-optimisation/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/09/19/ask-why-finding-motives-causes-and-purpose-in-data-science/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/insights/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/08/21/seven-ways-to-be-data-driven-off-a-cliff/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/08/04/is-data-scientist-a-useless-job-title/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/06/19/making-bayesian-ab-testing-more-accessible/2024-02-21T11:52:55+10:00https://yanirseroussi.com/2016/05/15/diving-deeper-into-causality-pearl-kleinberg-hill-and-untested-assumptions/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/economics/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/03/20/the-rise-of-greedy-robots/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/02/14/why-you-should-stop-worrying-about-deep-learning-and-deepen-your-understanding-of-causality-instead/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/scuba-diving/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2016/01/24/the-joys-of-offline-data-collection/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/facebook/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/12/08/this-holiday-season-give-me-real-insights/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/kaggle/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/11/23/the-hardest-parts-of-data-science/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/11/04/migrating-a-simple-web-application-from-mongodb-to-elasticsearch/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/mongodb/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/health/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/10/19/nutritionism-and-the-need-for-complex-models-to-explain-complex-phenomena/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/nutrition/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/nutritionism/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/recommender-systems/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/10/02/the-wonderful-world-of-recommender-systems/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/08/24/you-dont-need-a-data-scientist-yet/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/07/31/goodbye-parse-com/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/parse.com/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/07/06/learning-about-deep-learning-through-album-cover-classification/2024-01-16T09:56:03+10:00https://yanirseroussi.com/deep-learning-resources/2021-11-09T15:38:25+10:00https://yanirseroussi.com/2015/06/06/hopping-on-the-deep-learning-bandwagon/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/05/02/first-steps-in-data-science-author-aware-sentiment-analysis/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/sentiment-analysis/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/divestment/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/fossil-fuels/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/04/24/my-divestment-from-fossil-fuels/2024-01-16T09:56:03+10:00https://yanirseroussi.com/phd-work/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/03/22/the-long-road-to-a-lifestyle-business/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/gradient-boosting/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/kaggle-competition/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/02/11/learning-to-rank-for-personalised-search-yandex-search-personalisation-kaggle-competition-summary-part-2/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/01/29/is-thinking-like-a-search-engine-possible-yandex-search-personalisation-kaggle-competition-summary-part-1/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2015/01/15/automating-parse-com-bulk-data-imports/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/phantomjs/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/scikit-learn/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/12/29/stochastic-gradient-boosting-choosing-the-best-number-of-iterations/2023-07-06T09:28:02+10:00https://yanirseroussi.com/2014/12/15/seo-mostly-about-showing-up/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/traction-book/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/11/19/fitting-noise-forecasting-the-sale-price-of-bulldozers-kaggle-competition-summary/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/price-forecasting/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/11/05/bcrecommender-traction-update/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/music/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/10/23/what-is-data-science/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/10/07/greek-media-monitoring-kaggle-competition-my-approach/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/multi-label-classification/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/09/24/applying-the-traction-books-bullseye-framework-to-bcrecommender/2023-07-06T09:28:02+10:00https://yanirseroussi.com/2014/09/19/bandcamp-recommendation-and-discovery-algorithms/2023-07-06T09:28:02+10:00https://yanirseroussi.com/2014/09/07/building-a-recommender-system-on-a-shoestring-budget/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/08/30/building-a-bandcamp-recommender-system-part-1-motivation/2024-01-16T09:56:03+10:00https://yanirseroussi.com/tags/music-industry/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/08/24/how-to-almost-win-kaggle-competitions/2023-07-06T09:28:02+10:00https://yanirseroussi.com/tags/kaggle-beginners/2023-07-06T09:28:02+10:00https://yanirseroussi.com/2014/08/17/datas-hierarchy-of-needs/2024-01-16T09:56:03+10:00https://yanirseroussi.com/kaggle/2024-01-16T09:56:03+10:00https://yanirseroussi.com/2014/01/19/kaggle-beginner-tips/2023-07-06T09:28:02+10:00https://yanirseroussi.com/about/2024-03-08T11:21:16+10:00https://yanirseroussi.com/free-intro-call/2024-05-22T17:54:36+10:00https://yanirseroussi.com/posts/2024-05-09T10:03:31+10:00https://yanirseroussi.com/causal-inference-resources/2023-07-06T16:01:57+10:00https://yanirseroussi.com/consult/2024-05-23T15:31:11+10:00https://yanirseroussi.com/data-to-ai-health-check/2024-05-22T17:53:56+10:00https://yanirseroussi.com/contact/2024-05-23T15:31:11+10:00https://yanirseroussi.com/talks/2024-05-06T16:35:22+10:00https://yanirseroussi.com/til/2024-05-09T10:03:31+10:00 \ No newline at end of file diff --git a/tags/analytics/index.html b/tags/analytics/index.html index 400943f4d..cae49ddf5 100644 --- a/tags/analytics/index.html +++ b/tags/analytics/index.html @@ -2,7 +2,7 @@

    Startup data health starts with healthy event tracking

    Expanding on the startup health check question of tracking Kukuyeva’s five business aspects as wide events.

    June 10, 2024

    Assessing a startup's data-to-AI health

    Reviewing the areas that should be assessed to determine a startup’s opportunities and challenges on the data/AI/ML front.

    April 22, 2024

    Substance over titles: Your first data hire may be a data scientist

    Advice for hiring a startup’s first data person: match skills to business needs, consider contractors, and get help from data people.

    February 5, 2024

    Bootstrapping the right way?

    Video and summary of a talk I gave at YOW! Data on bootstrap estimation of confidence intervals.

    October 6, 2019

    Defining data science in 2018

    Updating my definition of data science to match changes in the field. It is now broader than before, but its ultimate goal is still to support decisions.

    July 22, 2018

    Customer lifetime value and the proliferation of misinformation on the internet

    There’s a lot of misleading content on the estimation of customer lifetime value. Here’s what I learned about doing it well.

    January 8, 2017

    If you don’t pay attention, data can drive you off a cliff

    Seven common mistakes to avoid when working with data, such as ignoring uncertainty and confusing observed and unobserved quantities.

    August 21, 2016

    Making Bayesian A/B testing more accessible

    A web tool I built to interpret A/B test results in a Bayesian way, including prior specification, visualisations, and decision rules.

    June 19, 2016

    Why you should stop worrying about deep learning and deepen your understanding of causality instead

    Causality is often overlooked but is of much higher relevance to most data scientists than deep learning.

    February 14, 2016

    This holiday season, give me real insights

    Some companies present raw data or information as “insights”. This post surveys some examples, and discusses how they can be turned into real insights.

    December 8, 2015