You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
---
dimensions:
season: SELECT someColumn as dimension FROM someTable
match: SELECT someColumn as dimension FROM someMatchTable
---SELECT*FROM someTable
WHERE someColumn = ${season}
AND someOtherColumn = ${match}
This would result in a folder structure 2 levels deep:
Currently on our dataset pages we have a SQL snippet that creates a view, and download buttons for the files;
the SQL snippet will need to be updated to properly load all the files:
We should also consider creating a .tar.gz file for each format for the entire set that reflects the file structure,
for example the parquet download button would reference a set.parquet.tar.gz file with the contents:
season=1/
data.parquet
season=2/
data.parquet
This allows users to still download the entire dataset for local analysis
Manifest
It may also be helpful to produce a manifest of URLs for a partitioned set, so that non-duckdb programs can easily reference all of the files; the structure of this is TBD, and this is not required as part of the first version
The text was updated successfully, but these errors were encountered:
As another thought, it might make more sense to define dimensions in the sibling markdown file instead of trying to handle frontmatter in the SQL itself
This file works as described above; and just needs to be split into prefect-y tasks.
One challenge here is trying to build the website at the end - if we want to address #45, then we need
to have a way of defining which queries are "supposed" to exist, along with which queries "actually" exist.
(e.g. figuring out which pages to build while accounting for deleted / failed queries)
Not sure what the best approach to the above is, unless we want to have some sort of "broken" page that is
thrown up for any queries that failed (?). This would just mean that we can build the navbar before firing off the queries, and then each query flow is responsible for building it's own page (succeed or fail)
DuckDB has a concept of Hive Partitioning that we want to mimic for large datasets (e.g. player stats) to prevent massive queries from being run.
This will come in two parts:
Defining Dimensions
I propose that we use frontmatter in SQL to define dimensionality, so that each dimension defines a query that has the results
Single Dimension Query
set.sql
This would result in a folder structure 1 level deep:
Multi Dimension Query
set2.sql
This would result in a folder structure 2 levels deep:
Proper output
SQL Snippets
Currently on our dataset pages we have a SQL snippet that creates a view, and download buttons for the files;
the SQL snippet will need to be updated to properly load all the files:
Batch Download
We should also consider creating a
.tar.gz
file for each format for the entire set that reflects the file structure,for example the parquet download button would reference a
set.parquet.tar.gz
file with the contents:This allows users to still download the entire dataset for local analysis
Manifest
It may also be helpful to produce a manifest of URLs for a partitioned set, so that non-duckdb programs can easily reference all of the files; the structure of this is TBD, and this is not required as part of the first version
The text was updated successfully, but these errors were encountered: