-
Notifications
You must be signed in to change notification settings - Fork 183
Open
Description
I'm trying to run tfdv process in Kubeflow Pipeline and visualize the results in the pipeline UI.
For statistics, I can easily visualize using get_statistics_html.
However, for schema and anomalies, I was struggled. We have display_schema and display_anomalies function, but it transforms data and calls IPython display inside. So, we have no way to get visualizable formatted data.
Eventually, I almost copied the display functions and change those to return DataFrame.
FYI, the code is like this.
def _transform_anormalies_to_df(anomalies) -> pd.DataFrame:
anomaly_rows = []
for feature_name, anomaly_info in anomalies.anomaly_info.items():
anomaly_rows.append(
[
display_util._add_quotes(feature_name),
anomaly_info.short_description,
anomaly_info.description,
]
)
if anomalies.HasField("dataset_anomaly_info"):
anomaly_rows.append(
[
"[dataset anomaly]",
anomalies.dataset_anomaly_info.short_description,
anomalies.dataset_anomaly_info.description,
]
)
if not anomaly_rows:
logging.info("No anomalies found.")
return None
else:
logging.warning(f"{len(anomaly_rows)} anomalies found.")
anomalies_df = pd.DataFrame(
anomaly_rows,
columns=[
"Feature name",
"Anomaly short description",
"Anomaly long description",
],
)
return anomalies_df
def main(schema_file: str, stats_file: str, anomalies_file: str):
schema = tfdv.load_schema_text(schema_file)
stats = tfdv.load_statistics(stats_file)
anomalies = tfdv.validate_statistics(statistics=stats, schema=schema)
tfdv.write_anomalies_text(anomalies, anomalies_file)
anomalies_df = _transform_anormalies_to_df(anomalies)
if anomalies_df is not None:
metadata = {
"outputs": [
{
"type": "table",
"storage": "inline",
"format": "csv",
"header": anomalies_df.columns.tolist(),
"source": anomalies_df.to_csv(header=False, index=False),
},
]
}
with open("/mlpipeline-ui-metadata.json", "w") as f:
json.dump(metadata, f)Does someone know any other good way?
What do you think about separate the display function for the transforming function and visualizing function like the function for statistics?
sinemetu1