-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add validation of final KG #723
Comments
Here's the output I'm getting right now:
|
We'd like to get additional insight from @amc-corey-cox & @cmungall during an upcoming Data Call. |
We discussed this more in today's data call. Outcomes:
Its also important to embed any validation outcomes into a validation workflow, which specifies:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I found myself going down a validation rabbit hole this week that wasn't represented in an issue, so I wanted to get it written down and try to nail down for myself what we can achieve and what "done" will look like in the short term.
I've been exploring kgx validation, and I think we can benefit from it, but we need to work around existing limitations.
Too much output
This will likely improve over time, for now kgx is enumerating each and every node or edge with the same problem. @bgood had a nice suggestion in biolink/kgx#354 that I slightly tweaked:
This produces pretty good output, but, for example, it still floods with too many examples of the same prefix complaint, which hides other prefix complaints that won't be visible unless the others are fixed, which is something like a 16 hour round trip because it's....
Too slow
kgx validate on the current monarch-kg is taking 10 hours, which is way too long to add to our existing pipeline that is unfortunately ballooning into the 6 hour range. I'm sure this could be massively improved on the kgx side, but even without that, could probably mitigate by putting the validation in it's own Jenkins job that runs after a kg build.
The text was updated successfully, but these errors were encountered: