Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
170 commits
Select commit Hold shift + click to select a range
9704439
[wip] add signature / metrics to databricks mlmodel
yoonhyejin Oct 6, 2025
2f5325a
chore(deps): bump actions/download-artifact from 4 to 5 (#14700)
dependabot[bot] Sep 12, 2025
1021b87
fix(ui) Improve flakiness of modules and templates cypress tests (#1…
chriscollins3456 Sep 12, 2025
2d2d9c7
chore(deps): bump aws-actions/configure-aws-credentials from 4 to 5 (…
dependabot[bot] Sep 12, 2025
c5bfb1a
chore(deps): bump actions/stale from 9 to 10 (#14697)
dependabot[bot] Sep 12, 2025
f1572b3
fix(ci): cloudflare workflow cannot run without token (#14749)
david-leifker Sep 12, 2025
3b45059
chore(deps): bump aquasecurity/trivy-action from 0.33.0 to 0.33.1 (#1…
dependabot[bot] Sep 12, 2025
5b50599
chore(deps-dev): bump vite from 6.3.5 to 6.3.6 in /datahub-web-react …
dependabot[bot] Sep 12, 2025
f2ffac5
chore(deps): bump actions/setup-python from 5 to 6 (#14701)
dependabot[bot] Sep 12, 2025
0d996d0
feat(ingest/fivetran): map google_cloud_postgresql => postgres (#14742)
ligfx Sep 12, 2025
50bd41f
docs(ingestion/redshift): update documentation to cover svv and stv s…
acrylJonny Sep 12, 2025
70b34d4
fix(structured-properties): fix structured properties manage role (#1…
david-leifker Sep 12, 2025
7cb33e3
fix(ci): revert workaround to enumerate image targets (#14755)
chakru-r Sep 14, 2025
d455ac7
fix(ci): use correct path for cypress test result xmls (#14756)
chakru-r Sep 14, 2025
5affd22
feat(ingest/neo4j): migrate Neo4j source to DataHub Python SDK v2 (#1…
sgomezvillamor Sep 15, 2025
4921b51
chore(ingestion/iceberg): Include explicit extras in dependencies (#1…
skrydal Sep 15, 2025
3822961
docs(ingestion/redshift): add required permissions for table and view…
acrylJonny Sep 15, 2025
70a3789
fix(ingest/pipeline): Fix BatchPartitionExecutor Shutdown Race Condi…
treff7es Sep 15, 2025
517a0ff
fixes to improve stability of the ci build (#14752)
alexsku Sep 15, 2025
bb3a39d
fix(ui) Fix weird indents on schema table descriptions (#14652)
chriscollins3456 Sep 15, 2025
0b550c4
fix(ui): Increasing the 'Try your test' modal width (#14612)
sakethvarma397 Sep 15, 2025
b49fbf0
Revert "chore(ingestion/iceberg): Safe-guard pyiceberg version before…
skrydal Sep 15, 2025
a48323c
feat(bigquery): add created and modified timestamps to dataset contai…
sgomezvillamor Sep 15, 2025
7859f78
fix(web): Search results Scroll Issue with filters sidebar (#14484)
andrewsrajasekar Sep 15, 2025
a80880f
refactor(ui): Prevent console warnings in Tabs.tsx and SidebarAboutSe…
asikowitz Sep 15, 2025
1100078
fix(ui/browse): Fix bug where browse would not paginate when leaving …
asikowitz Sep 15, 2025
e07fe1c
docs(Ask DataHub) Update naming conventions for Ask DataHub (#14746)
maggiehays Sep 16, 2025
efb001f
fix(ui): Render the values instead of urns in Policies Modal (#14613)
sakethvarma397 Sep 16, 2025
00e8b5e
fix(ui) Add collection of minor fixes for summary pages and home page…
chriscollins3456 Sep 16, 2025
34560e8
feat(ui) Update home page template editability (#14772)
chriscollins3456 Sep 16, 2025
8a7085a
feat(docs) Add feature guide doc for the new Custom Asset Summaries (…
chriscollins3456 Sep 16, 2025
1a9d02d
docs(release): Add release notes for version 0.3.14 (#14732)
gabe-lyons Sep 16, 2025
f1ee188
Revert "docs(release): Add release notes for version 0.3.14" (#14788)
gabe-lyons Sep 16, 2025
b4471d0
fix(sdk_v2/lineage): Fix handling of null platform (#14784)
skrydal Sep 17, 2025
48be21c
fix(ingest): change redash sql parse error to warnining (#14785)
kevinkarchacryl Sep 17, 2025
1fd9c81
feat(dbt): add filtering for materialized nodes based on their physic…
abdullahtariqq Sep 17, 2025
8abb32b
fix(ge_profiler): support nonnull_count for complex types (#14631)
ligfx Sep 17, 2025
ca35630
fix(): Fix bundled venv (#14660)
jjoyce0510 Sep 17, 2025
0e17470
feat(): Adding authenticator for OIDC OAuth (#14707)
jjoyce0510 Sep 17, 2025
3df7d11
docs(): Add documentation for Microsoft Teams Application (#14783)
jjoyce0510 Sep 17, 2025
dec55dd
docs(): Updating datahub cloud actions source docs to include support…
jjoyce0510 Sep 17, 2025
5a88ad3
fix(exception): surface exceptions to API response (#14795)
anshbansal Sep 17, 2025
9d3a50f
feat(summary-tab): use manage summary permission to allow editing doc…
purnimagarg1 Sep 17, 2025
8b8132b
fix(ui/summary-tab): fix view more button when switching tabs (#14796)
purnimagarg1 Sep 17, 2025
405973f
improvement(ui/summary-tab): use editor from component library in Com…
purnimagarg1 Sep 17, 2025
2710d24
feat(summary-page): add analytics events for asset summary page (#14798)
purnimagarg1 Sep 17, 2025
ea21f6a
fix(ui): handle edit documentation button on sidebar with new summary…
purnimagarg1 Sep 17, 2025
de335f4
fix(ui): fetch data product info for entity preview (#14800)
purnimagarg1 Sep 17, 2025
a4aeca6
fix(ui/summary-tab): fix functionality on add assets button in assets…
purnimagarg1 Sep 17, 2025
3919945
feat(superset/preset): propagate chart & dashboard tags to DataHub (#…
bmaquet Sep 17, 2025
8d0c5b6
fix(summaryTab): bring fixes from saas (#14764)
v-tarasevich-blitz-brain Sep 17, 2025
770167c
fix(summaryTab): fix empty state when assets were deleted (#14777)
v-tarasevich-blitz-brain Sep 17, 2025
8d60113
docs(logical): Add logical models feature guide (#14774)
asikowitz Sep 17, 2025
f1a670c
fix(impact-lineage): separate viz and impact query path (#14773)
david-leifker Sep 17, 2025
8c63bb9
docs(teams): Update Teams App setup instructions (#14803)
gabe-lyons Sep 18, 2025
fd4d019
refactor(ingestion): lookml source migration to use SDKv2 entities (#…
askumar27 Sep 18, 2025
7534930
fix(summaryTab): UI fixes (#14778)
v-tarasevich-blitz-brain Sep 18, 2025
411ced9
Revert "Revert "docs(release): Add release notes for version 0.3.14""…
gabe-lyons Sep 18, 2025
e5da385
feat(ingestion/tableau): parameter to have entity owners as email add…
acrylJonny Sep 18, 2025
3a6abc2
improvement(ui/summary-tab): handle deleted structured properties in …
purnimagarg1 Sep 18, 2025
ca22b08
docs(slack): Update Slack setup instructions for token generation (#1…
gabe-lyons Sep 18, 2025
f55d24a
Update menu on structured props table to new component (#14655)
annadoesdesign Sep 18, 2025
7b1a8d1
feat(ui): add summary page feature flag to local storage and make it …
purnimagarg1 Sep 18, 2025
d9eba21
chore(): bump spring (#14811)
david-leifker Sep 18, 2025
8bdcee3
refactor(ingestion): looker source migration to use SDKv2 entities (#…
askumar27 Sep 18, 2025
5f287ca
Update Entity Dropdown to Menu Component (#14656)
annadoesdesign Sep 18, 2025
2549f6f
fix(ingestion/looker): handle potential None values in explore datase…
askumar27 Sep 18, 2025
20f2942
feat(ui) Add ability to add links to asset header (#14770)
chriscollins3456 Sep 18, 2025
74c188a
feat(ingestion/superset): add HTTP retry configuration to prevent inf…
sgomezvillamor Sep 19, 2025
013c51e
fix(summary-tab): show correct feedback when trying to add duplicate …
purnimagarg1 Sep 19, 2025
1a27fb7
fix(summaryTab): follow ups from reloading modules PR (#14779)
v-tarasevich-blitz-brain Sep 19, 2025
cb90969
docs(): Update saas vs. oss docs for v0.3.14 (#14814)
jjoyce0510 Sep 19, 2025
604ee30
feat(ui): Add context paths for Data Products (#14802)
sakethvarma397 Sep 19, 2025
ed5cc76
ci: add username pr-labeler (#14828)
deepgarg760 Sep 22, 2025
5b15676
docs(bigquery): improve docs about strategies for lineage/usage extra…
sgomezvillamor Sep 22, 2025
792ea6c
chore(deps): fix (org.postgresql:postgresql) (#14831)
relaxedboi Sep 22, 2025
709b0bf
test: bring cypress tests for structured properties to OSS (#14832)
purnimagarg1 Sep 22, 2025
0b7a94b
fix(ui): fix link in entity header flashing infinitely (#14820)
purnimagarg1 Sep 22, 2025
5cf626c
docs(teams): removing stale teams docs (#14810)
gabe-lyons Sep 22, 2025
68aae40
fix(tests): removing flakey v1 test (#14676)
gabe-lyons Sep 22, 2025
4cfe846
feat(quickstart): bump min docker req (#14827)
deepgarg760 Sep 23, 2025
c2514ba
chore(python): drop pydantic v1 support (#14014)
hsheth2 Sep 23, 2025
c952350
fix(ingestion): avoid pyarrow CVE-2023-47248 (#14819)
sgomezvillamor Sep 23, 2025
27848bd
feat(ui/ingest): bring back exact start time in run history (#14837)
AdrianMachado Sep 23, 2025
99f3476
doc(datahub cloud): update recommended versions for cli, helm (#14841)
anshbansal Sep 23, 2025
e4d686e
feat(ingestion/snaplogic): Add snaplogic as a source for metadata ing…
SalimAbdul-snaplogic Sep 23, 2025
25c5be4
feat(ingest/tableau): enable extract_lineage_from_unsupported_custom_…
ligfx Sep 23, 2025
c444bd4
feat(sdk): Added support for Change Audit Stamps in Dashboard and Cha…
askumar27 Sep 23, 2025
ccb94c8
tests(snaplogic): fix tests (#14848)
sgomezvillamor Sep 23, 2025
7dcfde3
feature(transformers): Introduce Set browsePathsV2 transformer (#14825)
skrydal Sep 23, 2025
1cf65e6
Update react readme instructions (#14839)
AdrianMachado Sep 23, 2025
1ca04f2
refactor(ui): Fix typo in onboarding "Quality" pop-up message (#14816)
serragnoli Sep 24, 2025
b6c187e
fix(ingest/redshift): Fix for missing schema containers during ingest…
treff7es Sep 24, 2025
1c23c6e
fix(ui/ingest): system source save (#14847)
anshbansal Sep 24, 2025
c388400
feat(ui/ingest): filter for status on run history (#14851)
anshbansal Sep 24, 2025
eb85bbc
feat(alchemy): updating input with maxSize and Helper Text (#14856)
gabe-lyons Sep 24, 2025
cee6b81
fix(security): disable akka dns (#14858)
david-leifker Sep 24, 2025
e43fdc8
feat(ui/ingest): make filter params part of url for navigation (#14852)
anshbansal Sep 25, 2025
e82147d
fix(sdk): fixes imports for some SaaS classes (#14843)
sgomezvillamor Sep 25, 2025
8f1e1db
feat(ingest): add lowercase urn config option to metabase source (#14…
kevinkarchacryl Sep 25, 2025
1af4986
feat(ui) Update summary page editability (#14822)
chriscollins3456 Sep 25, 2025
8b99af5
fix(ingest/mssql): don't split_statements on keywords inside brackete…
ligfx Sep 25, 2025
19dd32e
fix(): fix config value (#14865)
david-leifker Sep 25, 2025
04d8fd4
fix(ingest/gcs): fix a number of issues and add integration tests (#1…
ligfx Sep 25, 2025
40e5628
feat(Plugin Loader) Add config to control plugin loader when failure …
zhixuanjia Sep 26, 2025
f8f5204
fix(ingestion): Fix for module level variable caching in sqllite chec…
treff7es Sep 26, 2025
7beadbd
fix(schema-registry): fix v1.2.0.1 schema registry bug (#14846)
david-leifker Sep 26, 2025
83f4d24
feat(ingestion): Enhanced column lineage extraction for Looker/LookML…
askumar27 Sep 26, 2025
41913fc
docs: hide pydantic_removed_field marked fields from documentation (#…
sgomezvillamor Sep 28, 2025
3b9627e
free up disk space in quickstart verification CI (#14879)
chakru-r Sep 28, 2025
de67e14
feat(sdk/search): add owner filter (#14649)
mayurinehate Sep 29, 2025
00074eb
feat(sdk/search): add tags, glossary terms filter (#14873)
mayurinehate Sep 29, 2025
3f29a22
docs(ingest): decode strings for easier getting started (#14830)
anshbansal Sep 29, 2025
091ba3e
feat(identity): only suggest users that are active or have displayNam…
benjiaming Sep 29, 2025
47c8ea4
feat(secret): FileSecretStore and EnvironmentSecretStore (#14882)
sgomezvillamor Sep 30, 2025
cf40a69
docs(snaplogic): Add snaplogic to integration page (#14881)
SalimAbdul-snaplogic Sep 30, 2025
ea67882
docs(cli): add details on parameters (#14886)
anshbansal Sep 30, 2025
ab041ba
fix(auth): include dataProcessInstance in policies UI (#14880)
chakru-r Sep 30, 2025
5a5c6d7
feat(web): UI pagination for Assertion List page (#14859)
AdrianMachado Sep 30, 2025
01b3d83
chore(model): remove unused model (#14887)
anshbansal Oct 1, 2025
1d39e4f
feat(ingestion/sqlglot): preserve CTEs when extracting SELECT from IN…
askumar27 Oct 1, 2025
ce75f2b
feat(): Basepath support (#14866)
david-leifker Oct 1, 2025
cb7aba9
fix(protobuf): use DynamicMessage for MESSAGE-type extension defaults…
abedatahub Oct 1, 2025
a26ee25
MCL Generation via CDC (#14824)
chakru-r Oct 1, 2025
6286fdd
chore(devenv): upgrade of opensearch to 2.17 and stability improvemen…
alexsku Oct 2, 2025
d8ff93c
feat(s3/ingest): performance improvements for get_dir_to_process and …
ligfx Oct 2, 2025
864c47e
feat: ConnectionModel and DataHubGraph:get_urns_by_filter and Structu…
sgomezvillamor Oct 2, 2025
0872130
fix(ingest/snowflake): Fixed the Snowflake external URL generation is…
treff7es Oct 2, 2025
af65734
ci(nightly): add more profiles to nightly tests (#14907)
chakru-r Oct 2, 2025
23b7daf
ci(cloudflare): fix workflow check for secret (#14906)
chakru-r Oct 2, 2025
1d246a2
chore(): bump grpc-protobuf (#14915)
david-leifker Oct 2, 2025
065d72c
fix(ingest/snowflake): Skip sql parsing if all the features disable i…
treff7es Oct 3, 2025
9821c53
feat(ingest): add high level stage for ingestion (#14862)
anshbansal Oct 3, 2025
dbcf984
fix(ingest/grafana): add exception handling (#14921)
anshbansal Oct 3, 2025
4631437
config(gms): enable some features by default (#14889)
anshbansal Oct 3, 2025
36ba9e0
ci(reviewers): add petemango to pr-labeller (#14922)
petemango Oct 3, 2025
e500b40
fix(ci): bump metadata-ingestion runner (#14924)
david-leifker Oct 3, 2025
f30f985
test(searchBarAutocomplete): add cypress tests (#13333)
v-tarasevich-blitz-brain Oct 3, 2025
5ef58b1
feat(structuredProperties): add new property to hide properties with …
v-tarasevich-blitz-brain Oct 3, 2025
e059d65
feat(structuredProperties): add option to hide properties with empty …
v-tarasevich-blitz-brain Oct 3, 2025
6d6d105
fix(ui) Fix re-expanding entity name after sidebar opens/closes (#14925)
chriscollins3456 Oct 3, 2025
19eab45
tests(structuredProperties): add cypress tests (#14888)
v-tarasevich-blitz-brain Oct 3, 2025
ec7c67d
docs(release-notes): disclaimers for 0.3.14 (#14812)
jayacryl Oct 3, 2025
413dd18
chore(doc): Fix json schema generation after pydantic v2 move (#14926)
skrydal Oct 4, 2025
7f7d9dc
feat(quickstart): bump min docker req and add validation (#14927)
deepgarg760 Oct 6, 2025
93037e4
docs: fix datajob docs inline code format (#14933)
yoonhyejin Oct 6, 2025
adbc630
feat(ingestion): Added Databricks support to Fivetran source (#14897)
askumar27 Oct 6, 2025
bc3683b
feat(ingest): ensure payload size constraints for queryProperties, qu…
sgomezvillamor Oct 6, 2025
60657ac
feat(search): implement multi-client search engine shim for ES8 suppo…
RyanHolstien Oct 6, 2025
346b405
fix(build): fix "grep: invalid option -- P" error in quickstart (#14916)
petemango Oct 7, 2025
eb4b9a5
feat: RelationshipChangeEvent model + attribution action graph + kafk…
sgomezvillamor Oct 7, 2025
96d6e0c
feat(ui/ingest): add source errors, warnings (#14939)
anshbansal Oct 7, 2025
85d7902
fix(smoke-tests): smoke test fixes for postgres profile (#14940)
chakru-r Oct 7, 2025
f2408ef
fix(web): embedded search list responsiveness (#14913)
jayacryl Oct 7, 2025
0693d43
fix(entity controller) Fix case sensitivity in entity controller (#14…
zhixuanjia Oct 7, 2025
bf741a6
improvement(summary-tab): hide current property in replace dropdown o…
purnimagarg1 Oct 7, 2025
4332461
test(customLinks): add cypress tests (#274) (#14834)
v-tarasevich-blitz-brain Oct 7, 2025
3fb298d
fix(ui/LineChart): adjust scaling of the line chart for the data with…
v-tarasevich-blitz-brain Oct 7, 2025
8d12c3b
feat(customLinks): add upsert link endpoint (#291) (#14854)
v-tarasevich-blitz-brain Oct 7, 2025
de67cb0
feat(analytics) Support google tag tracking only with ID supplied (#1…
chriscollins3456 Oct 7, 2025
a919511
feat(uplodaFiles): add feature flag (#14951)
v-tarasevich-blitz-brain Oct 8, 2025
c4b37f6
test(ingestion): add cypress tests for redesigned ingestion flow (#14…
purnimagarg1 Oct 8, 2025
2fe722a
test(cypress/statsTabV2): add cypress tests (#13495)
v-tarasevich-blitz-brain Oct 8, 2025
97162bf
test(customLinks): add integration tests (#275) (#14835)
v-tarasevich-blitz-brain Oct 8, 2025
ae19655
feat(structuredProperties): refresh structured properties on update (…
v-tarasevich-blitz-brain Oct 8, 2025
c5d249a
docs(ingestion): Updating breaking changes for LookML and Looker sour…
askumar27 Oct 8, 2025
e556673
chore: add 'askumar27' to PR labeler configuration (#14949)
sgomezvillamor Oct 8, 2025
3104b86
fix(protobuf): skip MESSAGE-type options in PropertyVisitor (#14957)
abedatahub Oct 8, 2025
992f4b6
fix error hanlding
yoonhyejin Oct 9, 2025
72b0e61
Merge branch 'master' into feat/unity-model-signiture
yoonhyejin Oct 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
178 changes: 172 additions & 6 deletions metadata-ingestion/src/datahub/ingestion/source/unity/proxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from unittest.mock import patch

import cachetools
import requests
from cachetools import cached
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.catalog import (
Expand Down Expand Up @@ -54,13 +55,15 @@
Metastore,
Model,
ModelVersion,
ModelVersionSignature,
Notebook,
NotebookReference,
Query,
Schema,
ServicePrincipal,
Table,
TableReference,
TrainingRun,
)
from datahub.ingestion.source.unity.report import UnityCatalogReport
from datahub.utilities.file_backed_collections import FileBackedDict
Expand Down Expand Up @@ -301,11 +304,87 @@ def ml_model_versions(
version = self._workspace_client.model_versions.get(
ml_model.id, version.version, include_aliases=True
)
optional_ml_model_version = self._create_ml_model_version(
ml_model, version
)
if optional_ml_model_version:
yield optional_ml_model_version
# get signature from artifacts
try:
url = f"{self._workspace_client.config.host}/api/2.0/fs/files/Models/{ml_model.catalog_name}/{ml_model.schema_name}/{ml_model.name}/{version.version}/MLmodel"
headers = {
"Authorization": f"Bearer {self._workspace_client.config.token}",
"Content-Type": "application/json",
}

raw_response = requests.get(url, headers=headers)

if raw_response.status_code == 200:
# Try to parse as YAML directly
import yaml

try:
mlmodel_data = yaml.safe_load(raw_response.text)

signature = None
if mlmodel_data and "signature" in mlmodel_data:
signature_data = mlmodel_data["signature"]
signature = self._parse_signature_from_yaml(
signature_data
)

optional_ml_model_version = self._create_ml_model_version(
ml_model, version, signature
)
if optional_ml_model_version:
yield optional_ml_model_version
except yaml.YAMLError as yaml_error:
print(
f"Error parsing YAML for model {ml_model.name} version {version.version}: {yaml_error}"
)
# Create model version without signature
optional_ml_model_version = self._create_ml_model_version(
ml_model, version, None
)
if optional_ml_model_version:
yield optional_ml_model_version
else:
print(
f"!!!! API returned error status {raw_response.status_code}: {raw_response.text}"
)
# Create model version without signature
optional_ml_model_version = self._create_ml_model_version(
ml_model, version, None
)
if optional_ml_model_version:
yield optional_ml_model_version

except Exception as e:
# Handle any errors gracefully
print(
f"Error getting signature for model {ml_model.name} version {version.version}: {e}"
)
# Create model version without signature
optional_ml_model_version = self._create_ml_model_version(
ml_model, version, None
)
if optional_ml_model_version:
yield optional_ml_model_version

def ml_training_run(self, run_id: str) -> Optional[TrainingRun]:
if not run_id:
return None
try:
response = self._workspace_client.api_client.do( # type: ignore
"GET",
f"/api/2.0/mlflow/runs/get?run_id={run_id}",
body={},
headers={
"Authorization": f"Bearer {self._workspace_client.config.token}",
"Content-Type": "application/json",
},
)
if not response or "run" not in response:
return None
return self._create_ml_training_run(response["run"])
except Exception as e:
print(f"Error getting training run for run_id {run_id}: {e}")
return None

def service_principals(self) -> Iterable[ServicePrincipal]:
for principal in self._workspace_client.service_principals.list():
Expand Down Expand Up @@ -936,7 +1015,10 @@ def _create_ml_model(
)

def _create_ml_model_version(
self, model: Model, obj: ModelVersionInfo
self,
model: Model,
obj: ModelVersionInfo,
signature: Optional[ModelVersionSignature],
) -> Optional[ModelVersion]:
if obj.version is None:
return None
Expand All @@ -956,8 +1038,92 @@ def _create_ml_model_version(
created_at=parse_ts_millis(obj.created_at),
updated_at=parse_ts_millis(obj.updated_at),
created_by=obj.created_by,
run_id=obj.run_id,
signature=signature,
)

def _create_ml_training_run(self, obj: Dict) -> Optional[TrainingRun]:
if not obj or len(obj) == 0:
return None

try:
# Extract run info from the response structure
run_id = (
obj.get("info", {}).get("run_id")
if "info" in obj
else obj.get("run_id")
)
run_name = (
obj.get("info", {}).get("run_name")
if "info" in obj
else obj.get("run_name")
)
params = {
item["key"]: item["value"]
for item in obj.get("data", {}).get("params", [])
if item.get("value")
}
metrics = {
item["key"]: item["value"]
for item in obj.get("data", {}).get("metrics", [])
if item.get("value")
}

if not run_id:
print(f"No run_id found in training run data: {obj}")
return None

return TrainingRun(
id=run_id,
name=run_name or f"run_{run_id}",
params=params,
metrics=metrics,
)
except Exception as e:
print(f"Error creating training run from data {obj}: {e}")
return None

def _parse_signature_from_yaml(
self, signature_data: Dict
) -> Optional[ModelVersionSignature]:
"""Parse signature data from MLmodel YAML content."""
try:
inputs = None
outputs = None
parameters = None

if "inputs" in signature_data:
inputs_str = signature_data["inputs"]
if isinstance(inputs_str, str):
import json

try:
inputs = json.loads(inputs_str)
except json.JSONDecodeError:
print(f"!!!! Error parsing inputs JSON: {inputs_str}")
inputs = None

if "outputs" in signature_data:
outputs_str = signature_data["outputs"]
if isinstance(outputs_str, str):
import json

try:
outputs = json.loads(outputs_str)
except json.JSONDecodeError:
print(f"!!!! Error parsing outputs JSON: {outputs_str}")
outputs = None

if "params" in signature_data and signature_data["params"]:
parameters = signature_data["params"]

return ModelVersionSignature(
inputs=inputs, outputs=outputs, parameters=parameters
)
except Exception as e:
print(f"!!!! Error parsing signature from YAML: {e}")
return None

def _create_service_principal(
self, obj: DatabricksServicePrincipal
) -> Optional[ServicePrincipal]:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,13 @@ class Model:
updated_at: Optional[datetime]


@dataclass
class ModelVersionSignature:
inputs: Optional[Dict[str, str]]
outputs: Optional[Dict[str, str]]
parameters: Optional[Dict[str, str]]


@dataclass
class ModelVersion:
id: str
Expand All @@ -361,3 +368,13 @@ class ModelVersion:
created_at: Optional[datetime]
updated_at: Optional[datetime]
created_by: Optional[str]
run_id: Optional[str]
signature: Optional[ModelVersionSignature]


@dataclass
class TrainingRun:
id: str
name: str
params: Optional[Dict[str, str]]
metrics: Optional[Dict[str, str]]
36 changes: 34 additions & 2 deletions metadata-ingestion/src/datahub/ingestion/source/unity/source.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@
ServicePrincipal,
Table,
TableReference,
TrainingRun,
)
from datahub.ingestion.source.unity.report import UnityCatalogReport
from datahub.ingestion.source.unity.tag_entities import (
Expand Down Expand Up @@ -694,8 +695,12 @@ def process_ml_models(self, schema: Schema) -> Iterable[MetadataWorkUnit]:
for ml_model_version in self.unity_catalog_api_proxy.ml_model_versions(
ml_model, include_aliases=self.config.include_ml_model_aliases
):
run_id = ml_model_version.run_id
ml_training_run = None
if run_id:
ml_training_run = self.unity_catalog_api_proxy.ml_training_run(run_id)
yield from self.process_ml_model_version(
ml_model_urn, ml_model_version, schema
ml_model_urn, ml_model_version, schema, ml_training_run
)

def process_ml_model(
Expand All @@ -716,7 +721,11 @@ def process_ml_model(
self.report.ml_models.processed(ml_model.id)

def process_ml_model_version(
self, ml_model_urn: str, ml_model_version: ModelVersion, schema: Schema
self,
ml_model_urn: str,
ml_model_version: ModelVersion,
schema: Schema,
ml_training_run: Optional[TrainingRun],
) -> Iterable[MetadataWorkUnit]:
extra_aspects = []
if ml_model_version.created_at is not None:
Expand All @@ -732,6 +741,26 @@ def process_ml_model_version(
)
)

training_metrics = None
hyper_params = None
if ml_training_run:
training_metrics = ml_training_run.metrics
hyper_params = ml_training_run.params

# Convert signature to custom properties dictionary
custom_properties = {}
if ml_model_version.signature:
import json
signature_dict = {
"inputs": ml_model_version.signature.inputs,
"outputs": ml_model_version.signature.outputs,
"parameters": ml_model_version.signature.parameters,
}
# Remove None values
signature_dict = {k: v for k, v in signature_dict.items() if v is not None}
if signature_dict:
custom_properties["signature"] = json.dumps(signature_dict)

ml_model = MLModel(
id=ml_model_version.id,
name=ml_model_version.name,
Expand All @@ -742,6 +771,9 @@ def process_ml_model_version(
platform=self.platform,
last_modified=ml_model_version.updated_at,
extra_aspects=extra_aspects,
training_metrics=training_metrics,
hyper_params=hyper_params,
custom_properties=custom_properties if custom_properties else None,
)

yield from ml_model.as_workunits()
Expand Down
Loading