-
Notifications
You must be signed in to change notification settings - Fork 1.2k
TraceID listed in server logs cannot be found in Grafana Tempo #594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It's possible that you are just missing the http:// header: |
Many thanks for the fast response, that worked I now get the traces from the rust webserver. However when searching the TraceIDs from the logs in Tempo I still get 404 errors. On the other hand, when exploring all traces I can find the For example, this is from the log of a request: That TraceID cannot be found However the corresponding trace that I found under "Explore" -> "Tempo" has a different ID (the second one is from another request): Is that still a misconfiguration on my part? |
Sorry to bother again but I did not resolve this. I still get 404 errors on the Trace IDs that I find in the log. Do you have any advice on what I could try? |
So I did some more digging and by enabling
I tried adding some formatting or rather extraction to your router code to include Other than that I will try to figure out if Grafana and/or Tempo can be configured to extract the correct IDs since it gets all three IDs as seen in the log, or I switch from Tempo to Jaeger which might not have this problem. |
That's very interesting thanks for sharing. It seems to be a bug in the code that generates the traceID. We never run into this issue on our side (since we always receive a traceID inside the header of the request) so I'm not sure where the bug is exactly. I will create a PR with a fix once I find it. |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
System Info
HF-TGI server running on Kubernetes, I executed
text-generation-launcher --env
inside the pod:Model being used:
Hardware used (GPUs, how many, on which cloud) (nvidia-smi): nvidia-smi see above, runs on an Azure Kubernetes Cluster VM spec:
Standard_NC24ads_A100_v4
Deployment specificities (Kubernetes, EKS, AKS, any particular deployments): Runs on an AKS and is installed through a helm chart
The current version being used: 0.9.1
Information
Tasks
Reproduction
I installed the HF TGI server, Prometheus, Grafana, Loki, and Grafana Tempo on Kubernetes. The latter four are in namespace
monitoring
and the HF TGI server is in namespacehf-tgi
. HF-TGI is created with the following environment variable set:OTLP_ENDPOINT: "tempo.monitoring:4317"
, i.e. it references the servicetempo
in namespacemonitoring
on port4317
. Service is up and running.So far this works fine, in Grafana under "Explore", I can select "Tempo", click on "Search" and Run Query. It then finds a lot of traces, mostly from target /Health, sometimes /Decode:

Now, when I go to "Explore", select "Loki", and then query the logs from the HF TGI pod, I can see the info messages like in stdout on the server itself. In the messages there is an entry in the JSON called "spans[0].trace_id". When I use the value from that field and search that in "Explore" -> "Tempo" -> TraceQL, I get an error message that the trace was not found:
failed to get trace with id: XXXX Status: 404 Not Found Body: trace not found
Expected behavior
My expected behavior would be: TraceIDs listed in the info messages on the server should point to a trace that exists.
However, I am new to tracing (and the Prometheus-Grafana-etc. stack) so my question is also if I am misconfiguring something here. I think it is a bug because I can see some traces but the TraceID from the info log cannot be found.
The text was updated successfully, but these errors were encountered: