From de0df95a2de4577344fe89f226a2be4deef693cc Mon Sep 17 00:00:00 2001 From: yanirs Date: Mon, 15 Jan 2024 23:57:04 +0000 Subject: [PATCH] deploy: 3b84b2473445ad2b33cf5af0969d2a249c798164 --- 2014/08/17/datas-hierarchy-of-needs/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- 2014/10/23/what-is-data-science/index.html | 2 +- 2014/11/05/bcrecommender-traction-update/index.html | 2 +- .../index.html | 2 +- 2014/12/15/seo-mostly-about-showing-up/index.html | 2 +- 2015/01/15/automating-parse-com-bulk-data-imports/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- 2015/03/22/the-long-road-to-a-lifestyle-business/index.html | 2 +- 2015/04/24/my-divestment-from-fossil-fuels/index.html | 2 +- .../index.html | 2 +- 2015/06/06/hopping-on-the-deep-learning-bandwagon/index.html | 2 +- .../index.html | 2 +- 2015/07/31/goodbye-parse-com/index.html | 2 +- 2015/08/24/you-dont-need-a-data-scientist-yet/index.html | 2 +- .../02/the-wonderful-world-of-recommender-systems/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- 2015/11/23/the-hardest-parts-of-data-science/index.html | 2 +- .../08/this-holiday-season-give-me-real-insights/index.html | 2 +- 2016/01/24/the-joys-of-offline-data-collection/index.html | 2 +- .../index.html | 2 +- 2016/03/20/the-rise-of-greedy-robots/index.html | 2 +- .../index.html | 2 +- .../19/making-bayesian-ab-testing-more-accessible/index.html | 2 +- 2016/08/04/is-data-scientist-a-useless-job-title/index.html | 2 +- .../08/21/seven-ways-to-be-data-driven-off-a-cliff/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- 2017/09/02/state-of-bandcamp-recommender/index.html | 2 +- .../index.html | 2 +- 2018/07/22/defining-data-science-in-2018/index.html | 2 +- 2018/11/03/reflections-on-remote-data-science-work/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- 2019/10/06/bootstrapping-the-right-way/index.html | 2 +- .../a-day-in-the-life-of-a-remote-data-scientist/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- 2021/04/05/some-highlights-from-2020/index.html | 2 +- 2021/10/07/my-work-with-automattic/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- .../analysis-strategies-in-online-a-b-experiments/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- .../12/causal-machine-learning-book-draft-review/index.html | 2 +- 2022/12/11/chatgpt-is-transformative-ai/index.html | 2 +- .../remaining-relevant-as-a-small-language-model/index.html | 2 +- .../how-hackable-are-automated-coding-assessments/index.html | 2 +- .../index.html | 2 +- .../index.html | 2 +- 2023/10/25/lessons-from-reluctant-data-engineering/index.html | 2 +- .../index.html | 2 +- about/index.html | 2 +- consult/index.html | 2 +- kaggle/index.html | 2 +- phd-work/index.html | 2 +- sitemap.xml | 2 +- talks/index.html | 4 ++-- 65 files changed, 66 insertions(+), 66 deletions(-) diff --git a/2014/08/17/datas-hierarchy-of-needs/index.html b/2014/08/17/datas-hierarchy-of-needs/index.html index 753f24ea5..5b22377c3 100644 --- a/2014/08/17/datas-hierarchy-of-needs/index.html +++ b/2014/08/17/datas-hierarchy-of-needs/index.html @@ -1,5 +1,5 @@ Data’s hierarchy of needs | Yanir Seroussi | Data & AI for Nature -

Data’s hierarchy of needs

One of my favourite blog posts in recent times is The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay Kreps. That post comprehensively describes how abstracting all the data produced by LinkedIn’s various components into a single log pipeline greatly simplified their architecture and enabled advanced data-driven applications. Among the various technical details there are some beautifully-articulated business insights. My favourite one defines data’s hierarchy of needs:

Effective use of data follows a kind of Maslow’s hierarchy of needs. The base of the pyramid involves capturing all the relevant data, being able to put it together in an applicable processing environment (be that a fancy real-time query system or just text files and python scripts). This data needs to be modeled in a uniform way to make it easy to read and process. Once these basic needs of capturing data in a uniform way are taken care of it is reasonable to work on infrastructure to process this data in various ways—MapReduce, real-time query systems, etc.

It’s worth noting the obvious: without a reliable and complete data flow, a Hadoop cluster is little more than a very expensive and difficult to assemble space heater. Once data and processing are available, one can move concern on to more refined problems of good data models and consistent well understood semantics. Finally, concentration can shift to more sophisticated processing—better visualization, reporting, and algorithmic processing and prediction.

In my experience, most organizations have huge holes in the base of this pyramid—they lack reliable complete data flow—but want to jump directly to advanced data modeling techniques. This is completely backwards. [emphasis mine]

Visually, it looks something like this:

Data’s hierarchy of needs

One of my favourite blog posts in recent times is The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay Kreps. That post comprehensively describes how abstracting all the data produced by LinkedIn’s various components into a single log pipeline greatly simplified their architecture and enabled advanced data-driven applications. Among the various technical details there are some beautifully-articulated business insights. My favourite one defines data’s hierarchy of needs:

Effective use of data follows a kind of Maslow’s hierarchy of needs. The base of the pyramid involves capturing all the relevant data, being able to put it together in an applicable processing environment (be that a fancy real-time query system or just text files and python scripts). This data needs to be modeled in a uniform way to make it easy to read and process. Once these basic needs of capturing data in a uniform way are taken care of it is reasonable to work on infrastructure to process this data in various ways—MapReduce, real-time query systems, etc.

It’s worth noting the obvious: without a reliable and complete data flow, a Hadoop cluster is little more than a very expensive and difficult to assemble space heater. Once data and processing are available, one can move concern on to more refined problems of good data models and consistent well understood semantics. Finally, concentration can shift to more sophisticated processing—better visualization, reporting, and algorithmic processing and prediction.

In my experience, most organizations have huge holes in the base of this pyramid—they lack reliable complete data flow—but want to jump directly to advanced data modeling techniques. This is completely backwards. [emphasis mine]

Visually, it looks something like this:

Building a Bandcamp recommender system (part 1 – motivation) | Yanir Seroussi | Data & AI for Nature -

Building a Bandcamp recommender system (part 1 – motivation)

I’ve been a Bandcamp user for a few years now. I love the fact that they pay out a significant share of the revenue directly to the artists, unlike other services. In addition, despite the fact that fans may stream all the music for free and even easily rip it, almost $80M were paid out to artists through Bandcamp to date (including almost $3M in the last month) – serving as strong evidence that the traditional music industry’s fight against piracy is a waste of resources and time.

One thing I’ve been struggling with since starting to use Bandcamp is the discovery of new music. Originally (in 2011), I used the browse-by-tag feature, but it is often too broad to find music that I like. A newer feature is the Discoverinator, which is meant to emulate the experience of browsing through covers at a record store – sadly, I could never find much stuff I liked using that method. Last year, Bandcamp announced Bandcamp for fans, which includes the ability to wishlist items and discover new music by stalking/following other fans. In addition, they released a mobile app, which made the music purchased on Bandcamp much easier to access.

All these new features definitely increased my engagement and helped me find more stuff to listen to, but I still feel that Bandcamp music discovery could be much better. Specifically, I would love to be served personalised recommendations and be able to browse music that is similar to specific tracks and albums that I like. Rather than waiting for Bandcamp to implement these features, I decided to do it myself. Visit BCRecommender – Bandcamp recommendations based on your fan account to see where this effort stands at the moment.

While BCRecommender has already helped me discover new music to add to my collection, building it gave me many more ideas on how it can be improved, so it’s definitely a work in progress. I’ll probably tinker with the underlying algorithms as I go, so recommendations may occasionally seem weird (but this always seems to be the case with recommender systems in the real world). In subsequent posts I’ll discuss some of the technical details and where I’d like to take this project.


It’s probably worth noting that BCRecommender is not associated with or endorsed by Bandcamp, but I doubt they would mind since it was built using publicly-available information, and is full of links to buy the music back on their site.

Public comments are closed, but I love hearing from readers. Feel free to +

Building a Bandcamp recommender system (part 1 – motivation)

I’ve been a Bandcamp user for a few years now. I love the fact that they pay out a significant share of the revenue directly to the artists, unlike other services. In addition, despite the fact that fans may stream all the music for free and even easily rip it, almost $80M were paid out to artists through Bandcamp to date (including almost $3M in the last month) – serving as strong evidence that the traditional music industry’s fight against piracy is a waste of resources and time.

One thing I’ve been struggling with since starting to use Bandcamp is the discovery of new music. Originally (in 2011), I used the browse-by-tag feature, but it is often too broad to find music that I like. A newer feature is the Discoverinator, which is meant to emulate the experience of browsing through covers at a record store – sadly, I could never find much stuff I liked using that method. Last year, Bandcamp announced Bandcamp for fans, which includes the ability to wishlist items and discover new music by stalking/following other fans. In addition, they released a mobile app, which made the music purchased on Bandcamp much easier to access.

All these new features definitely increased my engagement and helped me find more stuff to listen to, but I still feel that Bandcamp music discovery could be much better. Specifically, I would love to be served personalised recommendations and be able to browse music that is similar to specific tracks and albums that I like. Rather than waiting for Bandcamp to implement these features, I decided to do it myself. Visit BCRecommender – Bandcamp recommendations based on your fan account to see where this effort stands at the moment.

While BCRecommender has already helped me discover new music to add to my collection, building it gave me many more ideas on how it can be improved, so it’s definitely a work in progress. I’ll probably tinker with the underlying algorithms as I go, so recommendations may occasionally seem weird (but this always seems to be the case with recommender systems in the real world). In subsequent posts I’ll discuss some of the technical details and where I’d like to take this project.


It’s probably worth noting that BCRecommender is not associated with or endorsed by Bandcamp, but I doubt they would mind since it was built using publicly-available information, and is full of links to buy the music back on their site.

Public comments are closed, but I love hearing from readers. Feel free to contact me with your thoughts.

Hi!

I just found these articles a few years after their publication… I saw that the BCRecommender seems not active anymore and that the last post is from 2015.

Any update? I’m interested to have your feedback.

Thanks,

Clément