✧ ✧ ✧
✧
✦ ✧ ▅ ✦ ▅
✧ ▁✧ █ ✧ ▃ ✦▇ ✧✦ ☽ ✦ █
▇ ▁ █ █ ▇▅█ █▃ ✦✦ █
█▁█▅▁█ ▃▃▁✧▇▃✧ ▁▁ ▁▅█ ▁ ✦███▁ ██▅ ✦ ▇ █ ▇
██████▅███▅██▅▁██▁▁███▁▅█▁▅████▅███▁▁▅▁▅▅▅▁▁█▁▁█▁█▁▁▁
Skyline computation is an essential database operation that has many applications in multi-criteria decision making scenarios such as recommender systems. With the growth of data, efficiently finding interesting data within large datasets has become essential. Skyline queries are a method to select a subset of data by returning tuples that are not dominated by any other tuple. A tuple t dominates another tuple s if t is not worse than s in any attribute and is strictly better in at least one. As such, Skyline queries have emerged as an increasingly popular tool for identifying a set of interesting objects that balance different user-specified criteria.
Given a multidimensional data set, where the dimensions correspond to the criteria that need to be balanced, a Skyline query returns a set of interesting data points, aka. skyline points, that are not dominated by any other point in all dimensions. A point m dominates another point n, if m is better than or equal to n in all dimensions and strictly better than n in at least one dimension.
The best known example use case for a skyline query is a hotel booking scenario where users are looking for hotels. Assume many hotels are available and the user wants to find one based on three criteria: distance to the center, hotel rating and price per night. Further assume that the user is unable to say which of these criteria is more important. So, we need to look for hotels representing a good combination of all three criteria. The skyline consists of all hotels that represent a "good" combinations of the three criteria. For each of the other hotels, there is always at least one hotel in the skyline that is better with respect to the three criteria. So, being presented the skyline, the user gets an overview of the available hotels and can make the final decision with respect to her personal preferences for the three criteria.
HOTEL_ID | RATING | PRICE | DISTANCE_FROM_CENTER |
---|---|---|---|
a | 4 | 200 | 3 |
b | 5 | 300 | 4 |
c | 1 | 600 | 5 |
d | 2 | 700 | 1 |
e | 4 | 100 | 2 |
f | 5 | 290 | 3 |
g | 1 | 190 | 5 |
h | 1 | 290 | 5 |
i | 5 | 190 | 1 |
j | 2 | 490 | 5 |
k | 5 | 490 | 1 |
The following are few different options for calculating the Skyline or Pareto Frontier or Pareto Front using SQL
SELECT hotel_id, price, rating, distance_from_center FROM hotels AS o WHERE
not EXISTS(
SELECT 1 FROM hotels AS i WHERE
i.price <= o.price AND i.rating >= o.rating and i.distance_from_center <= o.distance_from_center
AND (i.price < o.price OR i.rating > o.rating OR i.distance_from_center < o.distance_from_center)
);
HOTEL_ID | RATING | PRICE | DISTANCE_FROM_CENTER |
---|---|---|---|
e | 4 | 100 | 2 |
i | 5 | 190 | 1 |
SELECT p.*
FROM hotels p
LEFT JOIN hotels q
ON (q.price <= p.price AND q.rating >= p.rating AND q.distance_from_center < p.distance_from_center)
OR (q.price < p.price AND q.rating >= p.rating AND q.distance_from_center <= p.distance_from_center)
OR (q.price <= p.price AND q.rating > p.rating AND q.distance_from_center <= p.distance_from_center)
WHERE q.price IS NULL AND q.rating IS NULL and q.distance_from_center is NULL;
HOTEL_ID | RATING | PRICE | DISTANCE_FROM_CENTER |
---|---|---|---|
e | 4 | 100 | 2 |
i | 5 | 190 | 1 |
WITH ranked AS (
SELECT
*
, dense_rank() OVER (ORDER BY price ASC, rating DESC, distance_from_center ASC ) AS rank1
, dense_rank() OVER (ORDER BY rating DESC, price ASC, rating DESC, distance_from_center ASC ) AS rank2
, dense_rank() OVER (ORDER BY distance_from_center ASC, rating DESC, price ASC, rating DESC) AS rank3
FROM hotels
)
SELECT *
FROM ranked
WHERE rank1 = 1 OR rank2 = 1 OR rank3 = 1;
HOTEL_ID | RATING | PRICE | DISTANCE_FROM_CENTER | RANK1 | RANK2 | RANK3 |
---|---|---|---|---|---|---|
i | 5 | 190 | 1 | 2 | 1 | 1 |
e | 4 | 100 | 2 | 1 | 5 | 4 |
-
{% for page in site.pages %}
{% if page.title contains "Skyline" %}
- {{ page.title | escape }} {% endif %} {% endfor %}