Skip to content

feat: add OpenLineage request logger extension#19107

Open
mshahid6 wants to merge 7 commits intoapache:masterfrom
mshahid6:add-open-lineage
Open

feat: add OpenLineage request logger extension#19107
mshahid6 wants to merge 7 commits intoapache:masterfrom
mshahid6:add-open-lineage

Conversation

@mshahid6
Copy link
Contributor

@mshahid6 mshahid6 commented Mar 7, 2026

Description

Added extensions-contrib/openlineage-emitter as a contrib extension that uses the RequestLogger to transform and send lineage information to any OpenLineage-compatible API.
For SQL queries, the SQL text is parsed with the Calcite parser to extract input datasources (FROM clauses, JOINs, CTEs) and output datasources (INSERT INTO). For native queries, table names are read from DataSource.getTableNames(). Native sub-queries spawned by a SQL execution are deduplicated against the SQL-level event.
Each event includes standard OpenLineage facets (processing_engine, jobType, sql,errorMessage) and custom Druid facets (druid_query_context with user identity and query metadata, druid_query_statistics with duration and bytes).

Transport is configurable: CONSOLE (default) logs JSON to the Druid log; HTTP POSTs to an OpenLineage endpoint such as Marquez. Can be combined with other loggers via the composing provider.

This PR has:

  • been self-reviewed.
  • using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

if (from == null) {
return;
}
if (from instanceof SqlIdentifier) {

Check notice

Code scanning / CodeQL

Chain of 'instanceof' tests

This if block performs a chain of 6 type tests - consider alternatives, e.g. polymorphism or the visitor pattern.
Maryam Shahid added 4 commits March 8, 2026 18:19
Add CTEs, deduplicated, and emits to resolve spellcheck errors
Fixes spellcheck error for openlineage-emitter compound word
@jtuglu1 jtuglu1 self-requested a review March 13, 2026 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant