You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: backend/app/api/docs/evaluation/get_evaluation.md
+58-1Lines changed: 58 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,12 @@ Retrieves comprehensive information about an evaluation run including its curren
6
6
7
7
-**evaluation_id**: ID of the evaluation run
8
8
9
+
## Query Parameters
10
+
11
+
-**get_trace_info** (optional, default: false): If true, fetch and include Langfuse trace scores with Q&A context. On first request, data is fetched from Langfuse and cached in the score column. Subsequent requests return cached data. Only available for completed evaluations.
12
+
13
+
-**resync_score** (optional, default: false): If true, clear cached scores and re-fetch from Langfuse. Useful when new evaluators have been added or scores have been updated. Requires get_trace_info=true.
14
+
9
15
## Returns
10
16
11
17
EvaluationRunPublic with current status and results:
@@ -18,11 +24,62 @@ EvaluationRunPublic with current status and results:
18
24
- status: Current status (pending, running, completed, failed)
19
25
- total_items: Total number of items being evaluated
20
26
- completed_items: Number of items completed so far
21
-
-results: Evaluation results (when completed)
27
+
-score: Evaluation scores (when get_trace_info=true and status=completed)
22
28
- error_message: Error message if failed
23
29
- created_at: Timestamp when the evaluation was created
24
30
- updated_at: Timestamp when the evaluation was last updated
25
31
32
+
## Score Format
33
+
34
+
When `get_trace_info=true` and evaluation is completed, the `score` field contains:
- Only complete scores are included (scores where all traces have been rated)
79
+
- Numeric values are rounded to 2 decimal places
80
+
- NUMERIC scores show `avg` and `std` in summary
81
+
- CATEGORICAL scores show `distribution` counts in summary
82
+
26
83
## Usage
27
84
28
85
Use this endpoint to poll for evaluation progress. The evaluation is processed asynchronously by Celery Beat (every 60s), so you should poll periodically to check if the status has changed to "completed" or "failed".
0 commit comments