Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.10", "3.11", "3.12", "3.13"]
python-version: ["3.10", "3.11", "3.12", "3.13", "3.14"]

steps:
- name: Checkout Commit
Expand Down
123 changes: 76 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,12 @@ Requires Python ^3.10 and Poetry ^2.0
``` shell
$ poetry install
$ poetry poe {format,check,test,all}
$ poetry groundtruth --help
$ poetry run groundtruth --help
```

**CAVEATS:**
- This program only works with extraction models.
- This program requires results to be in file version 1 format.
- This program does not work with bundled submissions or unbundling workflows.
- This program requires results to be in file version 3 format.
- This program requires results to be from a workflow on IPA 7.2 or later.
- This program requires Auto Review to be enabled for the workflow.

## Analysis Process
Expand All @@ -36,57 +35,89 @@ review URLs of the submitted documents will be written to a CSV for future refer
groundtruth submit \
--host try.indico.io --token indico_api_token.txt \
--workflow-id 1234 --documents-folder documents \
--submission-ids-file ground_truth_submission_ids.csv
--submission-ids-file submission_ids.csv
~~~

At this point, all of the submissions listed in the `ground_truth_submission_ids.csv`
will be queued for processing. Once processed, they should all be manually reviewed to
establish ground truth. After all reviews have been completed, the results can be
downloaded using the `retrieve` command.
Once all submissions have been processed and auto reviewed, download the auto reviewed
result files with the `retrieve` command. These contain the predictions to be compared
against ground truth later.

~~~ shell
groundtruth retrieve \
--host try.indico.io --token indico_api_token.txt \
--submission-ids-file ground_truth_submission_ids.csv \
--results-folder ground_truth_results
--submission-ids-file submission_ids.csv \
--results-folder auto_reviews
~~~

These results contain the ground truths *and* predictions for the first round of ground
truth analysis. Ground truths and prediction samples for a specific model can be
extracted from the results using the `extract` command.
Failed submissions will be logged and may be retried with the `retry` command.

~~~ shell
groundtruth retry \
--host try.indico.io --token indico_api_token.txt \
--submission-ids-file submission_ids.csv
~~~

Now review all submissions in Indico to capture the ground truth "answer key." You may
use the review URLs in `submissions_ids.csv` to navigate to them directly.

Once all submissions have been reviewed, download the HITL reviewed result files. These
contain the ground truths to be compared.

~~~ shell
groundtruth retrieve \
--host try.indico.io --token indico_api_token.txt \
--submission-ids-file submission_ids.csv \
--results-folder ground_truths
~~~

Extract the labels, values, and confidences for `auto_reviews` and `ground_truths` with
the `extract` command. Combine the output CSVs with the `combine` command. This will
automatically match predictions and ground truths by similarity.

~~~ shell
groundtruth extract \
--results-folder auto_reviews \
--samples-file auto_reviews.csv
~~~

~~~ shell
groundtruth extract \
--results-folder ground_truth_results \
--extractions-file ground_truth_extractions.csv \
--model "Invoice Extraction Model"
--results-folder ground_truths \
--samples-file ground_truths.csv
~~~

~~~ shell
groundtruth combine \
--ground-truths-file ground_truths.csv \
--predictions-file auto_reviews.csv \
--combined-file samples.csv
~~~

At this point, the ground truth/prediction samples in `ground_truth_extractions.csv`
should be manually reviewed to set the `accurate` column to `TRUE` for any samples that
should be considered accurate but that were not identical. The `edit_distance` and
`similarity` columns can be used to bubble up to the top of the extractions file
samples that are likely to be accurate. Any corrections to the selected ground truth
values should also be made in the `ground_truth` column to be used for this and future
rounds of analysis.
At this point, the samples in `samples.csv` should be manually reviewed to set the
`accurate` column to `TRUE` for any ground truth/prediction pairs that should be
considered accurate but whose values were not character-for-character identical. The
`edit_distance` and `similarity` columns can be used to bubble up to the top of the
samples file values that are likely to be accurate. Any corrections to the selected
ground truth values should also be made in the `ground_truth` column to be used for
future rounds of analysis.

After manual review and correction, the extractions file can be analyzed using the
`analyze` command to produce accuracy, volume, and STP performance metrics for a range
of confidence thresholds. Any samples that should not be included in the analysis
(such as ground truths with no value) should be filtered out of the extractions file
prior to analyzing it.
After manual review and correction, the samples file can be analyzed using the `analyze`
command to produce accuracy, volume, and STP performance metrics for a range of
specified confidence thresholds. Any samples that should not be included in the
analysis (such as ground truths with no value) should be filtered out of the
samples file prior to analyzing it.

~~~ shell
groundtruth analyze \
--extractions-file ground_truth_extractions.csv \
--analysis-file ground_truth_analysis.csv \
--samples-file samples.csv \
--analysis-file analysis.csv \
0.85 0.95 0.99 0.99999
~~~

Additional rounds of analysis can be performed after model remediation or auto review
enhancements have been made to determine the performance impact. Use the `submit`,
`retrieve`, and `extract` commands to process the same folder of documents through the
updated workflow, saving the results and IDs as a new set.
updated workflow, saving the results and IDs as a new CSV.

~~~ shell
groundtruth submit \
Expand All @@ -99,33 +130,31 @@ groundtruth submit \
groundtruth retrieve \
--host try.indico.io --token indico_api_token.txt \
--submission-ids-file remediated_submission_ids.csv \
--results-folder remediated_results
--results-folder remediated_reviews
~~~

~~~ shell
groundtruth extract \
--results-folder remediated_results \
--extractions-file remediated_extractions.csv \
--model "Invoice Extraction Model"
--results-folder remediated_reviews \
--samples-file remediations.csv
~~~

Note that the results and extractions will *not* contain ground truths, only remediated
predictions. Use the `combine` command to combine the ground truths from the original
round of analysis with the remediated predictions from this round. Ground truths and
predictions will be matched up by document file name and field.
Use the `combine` command to combine the ground truths from the original round of
analysis with the remediated predictions from this round. Ground truths and predictions
will be matched up by document filename and field.

~~~ shell
groundtruth combine \
--ground-truths-file ground_truth_extractions.csv \
--predictions-file remediated_extractions.csv \
--extractions-file combined_extractions.csv
--ground-truths-file samples.csv \
--predictions-file remediations.csv \
--combined-file remediated_samples.csv
~~~

Use the `analyze` command on the combined extractions CSV to calculate the remediated
Use the `analyze` command on the remediated samples file to calculate the remediated
performance metrics. This process can be repeated for as many rounds of remediation as
necessary.

Additional ground truth documents can be added to the set by submitting, reviewing,
retrieving, and extracting them using a separate submission IDs CSV and extractions
CSV. Afterwards, the submission IDs and extractions for the new documents can be merged
into the original CSVs.
retrieving, and extracting them using a separate submission IDs CSV and samples
file. Afterwards, the submission IDs and samples for the new documents can be merged
into the original samples file.
27 changes: 27 additions & 0 deletions groundtruth/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,33 @@ def submit(
).write_csv(submission_ids_file)


@arguably.command
def retry(
*,
host: Annotated[str, required],
token: Annotated[Path, required],
submission_ids_file: Path = Path("submission_ids.csv"),
) -> None:
import polars
import rich.progress

from . import workflows

config = IndicoConfig(host=host, api_token_path=token)

csv = polars.read_csv(submission_ids_file)
submission_ids = csv["submission_id"]
tracked_submissions = rich.progress.track(
submission_ids,
description="Retrying...",
auto_refresh=False,
)
workflows.retry_failed_submissions(
config=config,
submission_ids=tracked_submissions,
)


@arguably.command
def retrieve(
*,
Expand Down
4 changes: 2 additions & 2 deletions groundtruth/samples.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,8 +169,8 @@ def label_or_model(prediction: Prediction) -> str:
return (
prediction.label
if isinstance(prediction, Extraction)
else prediction.model.name
)
else prediction.task.name
) # fmt: skip


def text_or_label(prediction: Prediction) -> str:
Expand Down
32 changes: 28 additions & 4 deletions groundtruth/workflows.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from indico.queries import (
GetSubmission,
RetrieveStorageObject,
SubmissionResult,
RetrySubmission,
WorkflowSubmission,
)

Expand All @@ -35,7 +35,7 @@ def submit_documents( # type: ignore[no-any-unimported]
yield submission_id

elif document_file.is_dir():
bundle_files = list(
bundle_files = sorted(
filter(
lambda file: file.is_file() and not file.name.startswith("."),
document_file.glob("*"),
Expand Down Expand Up @@ -73,10 +73,34 @@ def retrieve_results( # type: ignore[no-any-unimported]
)
continue

submission_result = client.call(SubmissionResult(submission, wait=True))
result = client.call(RetrieveStorageObject(submission_result.result))
if submission.status == "FAILED":
rich.print(
"[yellow]"
f"Submission {submission_id} {file_name!r} failed. "
"Skipping."
"[/]"
)
continue

result = client.call(RetrieveStorageObject(submission.result_file))

sanitized_file_name = sanitize(file_name)
result_file = Path(sanitized_file_name + ".json")
result_file = results_folder / result_file
result_file.write_text(json.dumps(result))


def retry_failed_submissions( # type: ignore[no-any-unimported]
config: IndicoConfig,
submission_ids: Iterable[int],
) -> None:
"""
Retry failed submissions.
"""
client = IndicoClient(config)

for submission_id in submission_ids:
submission = client.call(GetSubmission(submission_id))

if submission.status == "FAILED":
client.call(RetrySubmission([submission_id]))
Loading
Loading