Skip to content

Commit f7c5267

Browse files
authored
Add lookup (#234)
1 parent a654e6e commit f7c5267

File tree

4 files changed

+135
-2
lines changed

4 files changed

+135
-2
lines changed

apl/tabular-operators/join-operator.mdx

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,10 +113,9 @@ LeftDataset
113113
- `LeftDataset`: The first dataset, also known as the outer dataset or the left side of the join. If you expect one of the datasets to contain consistently less data than the other, specify the smaller dataset as the left side of the join.
114114
- `RightDataset`: The second dataset, also known as the inner dataset or the right side of the join.
115115
- `KindOfJoin`: Optionally, the [kind of join](#kinds-of-join) to perform.
116-
- `Conditions`: The conditions for matching rows. The conditions are equality expressions that determine how Axiom matches rows from the `LeftDataset` (left side of the equality expression) with rows from the `RightDataset` (right side of the equality expression)).
116+
- `Conditions`: The conditions for matching rows. The conditions are equality expressions that determine how Axiom matches rows from the `LeftDataset` (left side of the equality expression) with rows from the `RightDataset` (right side of the equality expression). The two sides of the equality expression must have the same data type.
117117
- To join datasets on a field that has the same name in the two datasets, simply use the field name. For example, `on id`.
118118
- To join datasets on a field that has different names in the two datasets, define the two field names in an equality expression such as `on id == trace_id`.
119-
- The two sides of the equality expression must have the same data type.
120119
- You can use expressions in the join conditions. For example, to compare two fields of different data types, use `on id_string == tostring(trace_id_int)`.
121120
- You can define multiple join conditions. To separate conditions, use commas (`,`). Don’t use `and`. For example, `on id == trace_id, span == span_id`.
122121

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
---
2+
title: lookup
3+
description: 'This page explains how to use the lookup operator in APL.'
4+
---
5+
6+
The `lookup` operator extends a primary dataset with a lookup table based on a specified key column. It retrieves matching rows from the lookup table and appends relevant fields to the primary dataset. You can use `lookup` for enriching event data, adding contextual information, or correlating logs with reference tables.
7+
8+
The `lookup` operator is useful when:
9+
10+
- You need to enrich log events with additional metadata, such as mapping user IDs to user profiles.
11+
- You want to correlate security logs with threat intelligence feeds.
12+
- You need to extend OpenTelemetry traces with supplementary details, such as service dependencies.
13+
14+
## For users of other query languages
15+
16+
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.
17+
18+
<AccordionGroup>
19+
<Accordion title="Splunk SPL users">
20+
21+
In Splunk SPL, the `lookup` command performs a similar function by enriching event data with fields from an external lookup table. However, unlike Splunk, APL’s `lookup` operator only performs an inner join.
22+
23+
<CodeGroup>
24+
```sql Splunk example
25+
index=web_logs | lookup port_lookup port AS client_port OUTPUT service_name
26+
```
27+
28+
```kusto APL equivalent
29+
['sample-http-logs']
30+
| lookup kind=inner ['port_lookup'] on port
31+
```
32+
</CodeGroup>
33+
34+
</Accordion>
35+
<Accordion title="ANSI SQL users">
36+
37+
In ANSI SQL, `lookup` is similar to an `INNER JOIN`, where records from both tables are matched based on a common key. Unlike SQL, APL does not support other types of joins in `lookup`.
38+
39+
<CodeGroup>
40+
```sql SQL example
41+
SELECT logs.*, ports.service_name
42+
FROM logs
43+
INNER JOIN port_lookup ports ON logs.port = ports.port;
44+
```
45+
46+
```kusto APL equivalent
47+
['sample-http-logs']
48+
| lookup kind=inner ['port_lookup'] on port
49+
```
50+
</CodeGroup>
51+
52+
</Accordion>
53+
</AccordionGroup>
54+
55+
## Usage
56+
57+
### Syntax
58+
59+
```kusto
60+
PrimaryDataset
61+
| lookup kind=KindOfLookup LookupTable on Conditions
62+
```
63+
64+
### Parameters
65+
66+
- `PrimaryDataset`: The primary dataset that you want to extend. If you expect one of the tables to contain consistently more data than the other, specify the larger table as the primary dataset.
67+
- `LookupTable`: The data table containing additional data, also known as the dimension table or lookup table.
68+
- `KindOfLookup`: Optionally, specifies the lookup type as `leftouter` or `inner`. The default is `leftouter`.
69+
- `leftouter` lookup includes all rows from the primary dataset even if they don’t match the conditions. In unmatched rows, the new fields contain nulls.
70+
- `inner` lookup only includes rows from the primary dataset if they match the conditions. Unmatched rows are excluded from the output.
71+
- `Conditions`: The conditions for matching rows from `PrimaryDataset` to rows from `LookupTable`. The conditions are equality expressions that determine how Axiom matches rows from the `PrimaryDataset` (left side of the equality expression) with rows from the `LookupTable` (right side of the equality expression). The two sides of the equality expression must have the same data type.
72+
- To use `lookup` on a key column that has the same name in the primary dataset and the lookup table, simply use the field name. For example, `on id`.
73+
- To use `lookup` on a key column that has different names in the primary dataset and the lookup table, define the two field names in an equality expression such as `on id == trace_id`.
74+
- You can define multiple conditions. To separate conditions, use commas (`,`). Don’t use `and`. For example, `on id == trace_id, span == span_id`.
75+
76+
### Returns
77+
78+
A dataset where rows from `PrimaryDataset` are enriched with matching columns from `LookupTable` based on the key column.
79+
80+
## Use case example
81+
82+
Add a field with human-readable names for each service.
83+
84+
**Query**
85+
86+
```kusto
87+
let LookupTable=datatable(serviceName:string, humanreadableServiceName:string)[
88+
'frontend', 'Frontend',
89+
'frontendproxy', 'Frontend proxy',
90+
'flagd', 'Flagd',
91+
'productcatalogservice', 'Product catalog',
92+
'loadgenerator', 'Load generator',
93+
'checkoutservice', 'Checkout',
94+
'cartservice', 'Cart',
95+
'recommendationservice', 'Recommendations',
96+
'emailservice', 'Email',
97+
'adservice', 'Ads',
98+
'shippingservice', 'Shipping',
99+
'quoteservice', 'Quote',
100+
'currencyservice', 'Currency',
101+
'paymentservice', 'Payment',
102+
'frauddetectionservice', 'Fraud detection',
103+
];
104+
['otel-demo-traces']
105+
| lookup kind=leftouter LookupTable on $left.['service.name'] == $right.serviceName
106+
| project _time, span_id, ['service.name'], humanreadableServiceName
107+
```
108+
109+
[Run in Playground](https://play.axiom.co/axiom-play-qf1k/query?initForm=%7B%22apl%22%3A%22let%20LookupTable%3Ddatatable(serviceName%3Astring%2C%20humanreadableServiceName%3Astring)%5B%20'frontend'%2C%20'Frontend'%2C%20'frontendproxy'%2C%20'Frontend%20proxy'%2C%20'flagd'%2C%20'Flagd'%2C%20'productcatalogservice'%2C%20'Product%20catalog'%2C%20'loadgenerator'%2C%20'Load%20generator'%2C%20'checkoutservice'%2C%20'Checkout'%2C%20'cartservice'%2C%20'Cart'%2C%20'recommendationservice'%2C%20'Recommendations'%2C%20'emailservice'%2C%20'Email'%2C%20'adservice'%2C%20'Ads'%2C%20'shippingservice'%2C%20'Shipping'%2C%20'quoteservice'%2C%20'Quote'%2C%20'currencyservice'%2C%20'Currency'%2C%20'paymentservice'%2C%20'Payment'%2C%20'frauddetectionservice'%2C%20'Fraud%20detection'%2C%20%5D%3B%20%5B'otel-demo-traces'%5D%20%7C%20lookup%20kind%3Dleftouter%20LookupTable%20on%20%24left.%5B'service.name'%5D%20%3D%3D%20%24right.serviceName%20%7C%20project%20_time%2C%20span_id%2C%20%5B'service.name'%5D%2C%20humanreadableServiceName%22%7D)
110+
111+
**Output**
112+
113+
| _time | span_id | service.name | humanreadableServiceName |
114+
|------------------|-------------------------|----------------------|--------------------------|
115+
| Feb 27, 12:01:55 | 15bf0a95dfbfcd77 | loadgenerator | Load generator |
116+
| Feb 27, 12:01:55 | 86c27626407be459 | frontendproxy | Frontend proxy |
117+
| Feb 27, 12:01:55 | 89d9b5687056b1cf | frontendproxy | Frontend proxy |
118+
| Feb 27, 12:01:55 | bbc1bac7ebf6ce8a | frontend | Frontend |
119+
| Feb 27, 12:01:55 | cd12307e154a4817 | frontend | Frontend |
120+
| Feb 27, 12:01:55 | 21fd89efd3d36b15 | frontend | Frontend |
121+
| Feb 27, 12:01:55 | c6e8db2d149ab273 | frontend | Frontend |
122+
| Feb 27, 12:01:55 | fd569a8fce7a8446 | cartservice | Cart |
123+
| Feb 27, 12:01:55 | ed61fac37e9bf220 | loadgenerator | Load generator |
124+
| Feb 27, 12:01:55 | 83fdf8a30477e726 | frontend | Frontend |
125+
| Feb 27, 12:01:55 | 40d94294da7b04ce | frontendproxy | Frontend proxy |
126+
127+
## List of related operators
128+
129+
- [join](/apl/tabular-operators/join-operator): Performs more flexible join operations, including left, right, and outer joins.
130+
- [project](/apl/tabular-operators/project-operator): Selects specific columns from a dataset, which can be used to refine the output of a lookup operation.
131+
- [union](/apl/tabular-operators/union-operator): Combines multiple datasets without requiring a key column.

apl/tabular-operators/overview.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ The table summarizes the tabular operators available in APL.
1414
| [distinct](/apl/tabular-operators/distinct-operator) | Returns a dataset with unique values from the specified fields, removing any duplicate entries. |
1515
| [extend](/apl/tabular-operators/extend-operator) | Returns the original dataset with one or more new fields appended, based on the defined expressions. |
1616
| [extend-valid](/apl/tabular-operators/extend-valid-operator) | Returns a table where the specified fields are extended with new values based on the given expression for valid rows. |
17+
| [join](/apl/tabular-operators/join-operator) | Returns a dataset containing rows from two different tables based on conditions. |
1718
| [limit](/apl/tabular-operators/limit-operator) | Returns the top N rows from the input dataset. |
19+
| [lookup](/apl/tabular-operators/lookup-operator) | Returns a dataset where rows from one dataset are enriched with matching columns from a lookup table based on conditions. |
1820
| [order](/apl/tabular-operators/order-operator) | Returns the input dataset, sorted according to the specified fields and order. |
1921
| [parse](/apl/tabular-operators/parse-operator) | Returns the input dataset with new fields added based on the specified parsing pattern. |
2022
| [project](/apl/tabular-operators/project-operator) | Returns a dataset containing only the specified fields. |

docs.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,7 @@
404404
"apl/tabular-operators/extend-valid-operator",
405405
"apl/tabular-operators/join-operator",
406406
"apl/tabular-operators/limit-operator",
407+
"apl/tabular-operators/lookup-operator",
407408
"apl/tabular-operators/order-operator",
408409
"apl/tabular-operators/parse-operator",
409410
"apl/tabular-operators/project-operator",

0 commit comments

Comments
 (0)