Potential problem with dllama-api sometimes not seeing http request body? #153

jkeegan · 2025-01-28T08:04:31Z

Hi.. I've spent an entire day on this, so I figured I'd come mention it here.

If I do the following request of dllama-api with curl, it works fine:

$ curl -X POST http://clusterllm1.local:9990/v1/chat/completions -H "Content-Type: application/json" -d '{
  "messages": [{"role": "user", "content": "What is 4 * 11?\n"}],
  "temperature": 0.7,
  "stop": ["<|eot_id|>"],
  "max_tokens": 128
}'
{"choices":[{"finish_reason":"","index":-845970800,"message":{"content":"The answer is 44.","role":"assistant"}}],"created":1738050222,"id":"cmpl-j0","model":"Distributed Model","object":"chat.completion","usage":{"completion_tokens":5,"prompt_tokens":44,"total_tokens":49}}%
$

If, however, I run this paired-down isolated example Java code to do the request, dllama-api.cpp doesn't see the request body at all, just the headers:

import java.io.*;
import java.lang.*;
import java.net.*;
import java.text.*;
import java.util.*;
import java.util.stream.*;

public class Connect {
  public static void main(String[] args) throws Exception {
    String host = "clusterllm1.local";
    int port = 9990;
    URL endpoint = null;
    try {
      endpoint = new URL("http://" + host + ":" + port + "/v1/chat/completions");
    } catch (MalformedURLException mfe) {
      throw new RuntimeException(mfe);
    }
    System.out.println("About to open the connection...");
    HttpURLConnection connection = (HttpURLConnection) endpoint.openConnection();
    connection.setRequestMethod("POST");
    connection.setRequestProperty("Accept", "application/json");
    connection.setRequestProperty("Content-Type", "application/json; utf-8");
    connection.setDoOutput(true);
    connection.setDoInput(true);
    String requestBody = "{ \"messages\": [{\"role\": \"user\", \"content\": \"What is 4 * 11?\"}], \"temperature\": 0.7, \"stop\": [\"<|eot_id|>\"], \"max_tokens\": 128 }";
    System.out.println("DEBUG: request body is:\n=====\n" + requestBody + "\n=====\n");
    byte[] requestBodyBytes = requestBody.getBytes("UTF-8");
    OutputStream os = connection.getOutputStream();
    os.write(requestBodyBytes);
    os.flush();
    os.close();

    int responseCode = connection.getResponseCode();
    System.out.println("DEBUG: responseCode = " + responseCode);
    BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
    StringBuilder sb = new StringBuilder();
    String line;
    while ((line=reader.readLine()) != null) {
      sb.append(line);
    }
    System.out.println("DEBUG: responseBody: " + sb.toString());
    connection.disconnect();
  }
}

dllama-api.cpp reads all of the headers fine, it just doesn't get any request body. I've put debugging statements in dllama-api.cpp's HttpRequest.read() method and it's readHttpRequest() method.. all that's ever read is the headers, no body.

To my eye those should be the same requests. If I bind to a port with netcat (nc) and connect to that with the same code, I see the body:

$ nc -l -p 9990
POST /v1/chat/completions HTTP/1.1
Accept: application/json
Content-Type: application/json; utf-8
User-Agent: Java/17.0.2
Host: clusterllm1.local:9990
Connection: keep-alive
Content-Length: 127

{ "messages": [{"role": "user", "content": "What is 4 * 11?"}], "temperature": 0.7, "stop": ["<|eot_id|>"], "max_tokens": 128 }

dllama-api.cpp clearly works in some cases, such as that curl request. And I'd love to find out that I've screwed up something in the above code (or the hundred variants of it that I've tried).. but I fear that dllama-api.cpp's http request handling might be a bit fragile, and it would be good to have it as robust as possible.

I'd normally never mention this in an issue; instead I'd usually solve it myself and submit a pull request, but I'm stumped. I wanted to make sure to record it here so the ball doesn't get dropped and we miss a chance to see an existing bug in dllama-api.cpp.

The text was updated successfully, but these errors were encountered:

b4rtaz · 2025-01-28T08:17:58Z

Hello @jkeegan,

maybe this fragment is causing the problem

        std::vector<char> peekBuffer(bytesRead);
        bytesRead = recv(serverSocket, peekBuffer.data(), bytesRead, 0);
        if (bytesRead <= 0) {

So maybe the Java client sends the header as one chunk and then sends the body as the second chunk. This implementation is naive, so when the data stops coming, it assumes the request has been fully sent.

You can validate this theory by creating a raw socket in Java and sending an HTTP request with full control over the chunks.

jkeegan · 2025-01-30T08:24:25Z

Damn I wish I'd seen your comment, I just came to the same conclusion and have a fix working. Cleaning it up now and I'll send a pull request sometime tomorrow.

jkeegan · 2025-02-05T18:36:58Z

Fixed it. Issued a pull request, all three checks passed.

#155

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential problem with dllama-api sometimes not seeing http request body? #153

Potential problem with dllama-api sometimes not seeing http request body? #153

jkeegan commented Jan 28, 2025

b4rtaz commented Jan 28, 2025

jkeegan commented Jan 30, 2025

jkeegan commented Feb 5, 2025

Potential problem with dllama-api sometimes not seeing http request body? #153

Potential problem with dllama-api sometimes not seeing http request body? #153

Comments

jkeegan commented Jan 28, 2025

b4rtaz commented Jan 28, 2025

jkeegan commented Jan 30, 2025

jkeegan commented Feb 5, 2025