A query engine over semi-structured (JSON) logs.
Similar to trino, but doesn't require a table's schema (column & types) before executing a query.
While trino receives SQL and starts returning results once the entire query finishes (batch ETL), miso's query API receives a sort of "ast" of the query plan (this was done to allow for any query language on the frontend), and streams back the results using SSE (stream ETL).
It supports the same optimization based predicate pushdown mechanism in trino, so a query transpiles as many query steps as its connector supports into the connector's query language, returning fewer documents over the network (which is usually the bottleneck), making queries return much faster.
Here's an example of a query supported today by miso (localqw
is a Quickwit connector to localhost:7280/
):
# scan localqw.hdfs1
# | union (scan localqw.hdfs2)
# | summarize
# min_tenant = min(tenant_id)
# max_tenant = max(tenant_id)
# count = count()
# by timestamp, severity_text
# | join (
# scan localqw.stackoverflow
# | where questionId > 80
# ) on min_tenant, questionId
# | order by count desc;
# curl supports SSE by adding the -N flag.
curl -N -H 'Content-Type: application/json' localhost:8080/query -d '{
"query": [
{ "scan": ["localqw", "hdfs1"] },
{ "union": [{ "scan": ["localqw", "hdfs2"] }] },
{
"summarize": {
"aggs": {
"min_tenant": {"min": "tenant_id"},
"max_tenant": {"max": "tenant_id"},
"count": "count"
},
"by": ["timestamp", "severity_text"]
}
},
{
"join": [
{"on": ["min_tenant", "questionId"]},
[
{ "scan": ["localqw", "stackoverflow"] },
{ "filter": {"gt": ["questionId", "80"]} }
]
]
}
{ "sort": [{"by": "count", "order": "desc"}] }
]
}'