To use the application follow these steps:
-
Clone this repo:
git clone https://github.com/BenjaminNeustadt/growth_analytic_microservice.git
-
Install the dependencies:
bundle install
-
Setup the databases:
bundle exec rake db:setup
-
Run the tests:
rspec -fd
-
Start the server locally:
rackup
To get a list of all Rake commands run:
bundle exec rake -T
To drop the database:
RACK_ENV=development rake db:drop
To reset the database:
RACK_ENV=development rake db:reset
Running these last two rake commands without specifying the environment will result in a request for environement confirmation.
I used Rspec for both integration and unit testing. To check for code coverage I used the 'Simple Cov' gem, which reports approximately 99% code coverage.
My approach was to first read the brief and consider the relations that would occur between the different payload packages for events and signups. I understood that for the MVP only the signup events would be necessary, but I could not help consider the possible relations between signup events and pageviews. Specifically I was interested in how pageview payloads could best be leveraged to extract as much information as possible from them. Although I did not implement them, I considered such questions as: In order to register a conversion event (from an unsubscribed user to a subscribed user), and to know from where this user originally came from (thereby informing us which was the campaign driven the most adoption) which pageview event would be considered the original source pageview event: the one furthest back in time, or the most recent one? It turns out that an answer to this would be that two attributions can be allotted: "first click" and "last click".
I considered this when designing the API of the "events" which I labelled as "signup", "trial", "unsubscribe" to illustrate some basis for user "state". I also included conversion events within the signup event, if a user previously on trial "signed-up" (by using the fingerprint stored in cookies). I did this because I felt it would be useful information for exterior microservices to quickly consult the "growth analytic" API without having to recalculate that information and slow down their respective process, since it would be consumed in real time. I therefore thought that it would be useful to leverage the calculation of the database to omit some of that responsibility later on. Though this is a question I am not entirely sure about.
For the most part, my decisions were based on writing the cleanest and most efficient possible code. As I felt that writing unstructured code would poetntially slow down the service, and cause a bottlekneck for the incoming requests.
I used Sinatra as a framework, as I felt that I could personally produce an MVP of a better quality using Sinatra, as I am more familiar with its DSL. Although I was cognizant that those reviewing may prefer Rails, so where possible I attempted to incorporate a Rails "flavour".
I considered other frameworks, particularly Hanami, which I understand to be faster in various capacities to both Sinatra and Rails. In the long run, I would like to switch it to Rails or Hanami. I also considered such questions as whether it would be more benficial to use Web storage over cookies.
In regards the webhook, I was uncertain how the it would actually be integrated in the client-server communication. Though I gave an example of what I understaod would be the process in my diagram below. My initial thought was that there would be a redirect to Blinklist from the webhook.
I used an MVC structure, with RESTful routes. The file structure is comparable to a Rails application; so the routes, for instance, can be found inside the "config" file in the root of the application.
I considered how I could make the application as or "compact" as possible, and felt that Sinatra would facilitate this, since I only required the dependencies I would actually need.
Since Rails and Sinatra both rely on Rake, the commands required to run the setting up of the database would be almost identical,
I used the challenge as an opportunity to better understand some of the ideas behind Rails, but was cognizant to keep the code as DRY as possible.
In some places however I was unsure if certain refactoring would cause a problem for speed.
For example, both POST routes of "signup events" and "pageviews" currently use process_webhook_payload
in "app/controllers/'endpoints.rb".
Some things that I considered to safeguard against potential request congestion, or overloading the server were:
-
Load balancing, by distributing the traffic of requests across multiple servers we can improve the latency of the system.
-
Using caching at the necessary application layers to boost the response time.
Overall, the downside of using Sinatra was the longer build time, though untimately it was beneficial to me, and in the long run these are some benefits that I considered:
-
Having built every aspect from the ground up I feel that I could personally more easily integrate new features or extensions.
-
I would eliminate some of the Rails magic by staying closer to Ruby language.
-
It would be easier for me to maintain.
Regarding some aspects of the code syntax where I was inspired by "Rails flavour", besides the file structure, here is an example in syntax:
# Rails routes
Rails.application.routes.draw do
root "articles#index"
get "/articles", to: "articles#index"
end
# Growth Analytic routes
module Routes
def self.registered app
app.post('/event') { process_webhook_payload }
app.get('/') { event_endpoint }
end
end
For the purposes of the MVP I used SQLite3, and the ActiveRecord ORM. Another database could easily be switched out for this one, I feel the database choice for this service would be important, and I feel a relatinoal database would be optimal, as we would be prioritizing for the strong consistency requirement.
-
Notes, initial thoughts, research!
-
Diagramming (see below)
-
Test
-
Build routes
-
Test
-
Build things inside one file and gradually extract/refactor outwards
-
Setup an ORM and a storage file
-
Create seed data and use a service to post mock data
I would normally write tests first, and follow a test driven development process. This instance was something of an anomaly as I played around with inserting mock data through the webhooks, and then found that I had not written tests. So I then did regression testing. Having done so I felt more confident in refactoring. I then fell back on test driven development when adding other routes and logic, specifically for the 'PageView' class that had not yet been written. My main concern was really the attributions, so I spent a good deal of time considering those, I have left my initial notes on the API design at the end of this README.
|-------------------------------+ +-----------------------+ | | | | | CLIENT | π | _FACEBOOK AD_ | POST +----------------------+ | | -------------------------> | | -----------------------------> | πΈοΈ πΈοΈ πΈοΈ | | π» | | | | | --------------------------------> CALLS THE EVENT AND DROPS OFF THE PAYLOAD | | | | | **WEBHOOK** | ++-----------------------------------------------------------++ |-------------------------------+ ---------------+ | | | | || PAYLOAD = || | +-----------------------+ | 2 entry points | || { || | | +-----------------------+ | | || fingerprint: "b998efcb-1af3-4149-9b56-34c4482f6606", || | | | | | 1 endpoint | || user_id: null, || | If user clicks directly | π | _GOOGLE AD_ |------------------------------> | exposing data | || url: "https://www.blinkist.com/en", || | It first reroutes to webhook +--------> | | | to be consumed | || referrer_url: null, || | to give the data | | | by other micro- | || created_at: "2023-01-20 13:59:56.437947 UTC" || β½ | | | service | || } || +-------------------------------| +-----------------------+ | = | || || | | On some websites you can attach a webhook | API | || || | BLINKIST | to an ad, which helps you tally views | | || The data is passed somewhere (i.e. a database) || | | π | (REDIRECT/POST) | || in our case using an ORM, some operations can be performed| | π | ------------------------------------------------------------------------------------> | | || immediately to leverage the power of database calc || | | <-------------------------------------------------------------------------------------+----------------------+ || || | | | ++-----------------------------------------------------------++ | | | +-------------------------------+ +----------+----------+ | | | 3rd ENDPOINT API | +---------------------------------------------+----------------+ | | | {response: status ok, | | users: { | | id: | | attributions: | | users: { | | user_1: { | | fingerprint: | | url: | | url_referrer: | | created_at: | | } | | conversion_events: some stuff, | || ROI_of_campaigns: some calculation of | | campaign price vs how many new signups | | [...] | | | | | | | | | | | | | | } (DRAFT) | +---------------------------------------------+----------------+ ^ ^ ^ ^ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-------------------------------------------------------------------------+ | OTHER MICROSERVICES THAT CAN PERFORM BUSINESS LOGIC ON THE STORED DATA, | | AFTER CONSUMING IT | | | | EXAMPLES: | | - REPORTING CONVERSION EVENTS | | - CALCULATING ROI OF CAMPAIGNS | | - MAKE CAMPAIGN BIDDING DECISIONS | +-------------------------------------------------------------------------+
The benefit of a microservice is that it can do ONE small and well defined operation. Therefore, the API endpoint that exposes the data should not do any more than it absolutely needs to. That will be the role of other services. This service will only store and provide for later use the valuable data. This is one step in the process. Itβs the entry point for the data, it is only the data factory.
Real-time
JSON return
Must be fast
Multiple different query points
Reporting conversion events (USER)
User focused
Time/date range
WHERE did they come from, when + insights
WHERE
when
HOW LONG AGO
How many pages viewed before conversion + after conversion
Calculate ROI (Source + Campaign) ?include=users
viewed
signedup
took (average, fasted, longest)
How many pages viewed before conversion + after conversion (averages)
optionally include all user ids
Campaign Bidding Decisions (SOURCE)
FOR EACH SOURCE
volume (how many)
signedup
took (average, fasted, longest)
How many pages viewed before conversion + after conversion (averages)
New signup from campaign 3
Read count = 123
count + 1 = 124
save
New signup from campaign 3
Read count = 123
count + 1 = 124
save