Skip to content

TraceID listed in server logs cannot be found in Grafana Tempo #594

Closed as not planned
@pdeubel

Description

@pdeubel

System Info

HF-TGI server running on Kubernetes, I executed text-generation-launcher --env inside the pod:

2023-07-12T12:58:48.739266Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.70.0
Commit sha: 31b36cca21fcd0e6b7db477a7545063e1b860156
Docker label: sha-31b36cc
nvidia-smi:
Wed Jul 12 12:58:48 2023       
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |                               |                      |               MIG M. |
   |===============================+======================+======================|
   |   0  NVIDIA A100 80G...  On   | 00000001:00:00.0 Off |                    0 |
   | N/A   35C    P0    71W / 300W |  46936MiB / 81920MiB |      0%      Default |
   |                               |                      |             Disabled |
   +-------------------------------+----------------------+----------------------+
                                                                                  
   +-----------------------------------------------------------------------------+
   | Processes:                                                                  |
   |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
   |        ID   ID                                                   Usage      |
   |=============================================================================|
   +-----------------------------------------------------------------------------+
2023-07-12T12:58:48.739312Z  INFO text_generation_launcher: Args { model_id: "OpenAssistant/falcon-40b-sft-mix-1226", revision: None, sharded: None, num_shard: None, quantize: Some(Bitsandbytes), dtype: None, trust_remote_code: false, max_concurrent_requests: 512, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: 16000, max_waiting_tokens: 20, hostname: "production-hf-text-generation-inference-6594cb8f5d-z4mdf", port: 80, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, json_output: false, otlp_endpoint: Some("tempo.monitoring:4317"), cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_domain: None, ngrok_username: None, ngrok_password: None, env: true }

Model being used:

{
  "model_id": "OpenAssistant/falcon-40b-sft-mix-1226",
  "model_sha": "9ac6b7846fabe144646213cf1c6ee048b88272a7",
  "model_dtype": "torch.float16",
  "model_device_type": "cuda",
  "model_pipeline_tag": "text-generation",
  "max_concurrent_requests": 512,
  "max_best_of": 2,
  "max_stop_sequences": 4,
  "max_input_length": 1024,
  "max_total_tokens": 2048,
  "waiting_served_ratio": 1.2,
  "max_batch_total_tokens": 16000,
  "max_waiting_tokens": 20,
  "validation_workers": 2,
  "version": "0.9.1",
  "sha": "31b36cca21fcd0e6b7db477a7545063e1b860156",
  "docker_label": "sha-31b36cc"
}

Hardware used (GPUs, how many, on which cloud) (nvidia-smi): nvidia-smi see above, runs on an Azure Kubernetes Cluster VM spec: Standard_NC24ads_A100_v4

Deployment specificities (Kubernetes, EKS, AKS, any particular deployments): Runs on an AKS and is installed through a helm chart
The current version being used: 0.9.1

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

I installed the HF TGI server, Prometheus, Grafana, Loki, and Grafana Tempo on Kubernetes. The latter four are in namespace monitoring and the HF TGI server is in namespace hf-tgi. HF-TGI is created with the following environment variable set: OTLP_ENDPOINT: "tempo.monitoring:4317", i.e. it references the service tempo in namespace monitoring on port 4317. Service is up and running.

So far this works fine, in Grafana under "Explore", I can select "Tempo", click on "Search" and Run Query. It then finds a lot of traces, mostly from target /Health, sometimes /Decode:
image

Now, when I go to "Explore", select "Loki", and then query the logs from the HF TGI pod, I can see the info messages like in stdout on the server itself. In the messages there is an entry in the JSON called "spans[0].trace_id". When I use the value from that field and search that in "Explore" -> "Tempo" -> TraceQL, I get an error message that the trace was not found:
failed to get trace with id: XXXX Status: 404 Not Found Body: trace not found

image

Expected behavior

My expected behavior would be: TraceIDs listed in the info messages on the server should point to a trace that exists.

However, I am new to tracing (and the Prometheus-Grafana-etc. stack) so my question is also if I am misconfiguring something here. I think it is a bug because I can see some traces but the TraceID from the info log cannot be found.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions