Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential problem with dllama-api sometimes not seeing http request body? #153

Open
jkeegan opened this issue Jan 28, 2025 · 3 comments
Open

Comments

@jkeegan
Copy link
Contributor

jkeegan commented Jan 28, 2025

Hi.. I've spent an entire day on this, so I figured I'd come mention it here.

If I do the following request of dllama-api with curl, it works fine:

$ curl -X POST http://clusterllm1.local:9990/v1/chat/completions -H "Content-Type: application/json" -d '{
  "messages": [{"role": "user", "content": "What is 4 * 11?\n"}],
  "temperature": 0.7,
  "stop": ["<|eot_id|>"],
  "max_tokens": 128
}'
{"choices":[{"finish_reason":"","index":-845970800,"message":{"content":"The answer is 44.","role":"assistant"}}],"created":1738050222,"id":"cmpl-j0","model":"Distributed Model","object":"chat.completion","usage":{"completion_tokens":5,"prompt_tokens":44,"total_tokens":49}}%
$

If, however, I run this paired-down isolated example Java code to do the request, dllama-api.cpp doesn't see the request body at all, just the headers:

import java.io.*;
import java.lang.*;
import java.net.*;
import java.text.*;
import java.util.*;
import java.util.stream.*;

public class Connect {
  public static void main(String[] args) throws Exception {
    String host = "clusterllm1.local";
    int port = 9990;
    URL endpoint = null;
    try {
      endpoint = new URL("http://" + host + ":" + port + "/v1/chat/completions");
    } catch (MalformedURLException mfe) {
      throw new RuntimeException(mfe);
    }
    System.out.println("About to open the connection...");
    HttpURLConnection connection = (HttpURLConnection) endpoint.openConnection();
    connection.setRequestMethod("POST");
    connection.setRequestProperty("Accept", "application/json");
    connection.setRequestProperty("Content-Type", "application/json; utf-8");
    connection.setDoOutput(true);
    connection.setDoInput(true);
    String requestBody = "{ \"messages\": [{\"role\": \"user\", \"content\": \"What is 4 * 11?\"}], \"temperature\": 0.7, \"stop\": [\"<|eot_id|>\"], \"max_tokens\": 128 }";
    System.out.println("DEBUG: request body is:\n=====\n" + requestBody + "\n=====\n");
    byte[] requestBodyBytes = requestBody.getBytes("UTF-8");
    OutputStream os = connection.getOutputStream();
    os.write(requestBodyBytes);
    os.flush();
    os.close();

    int responseCode = connection.getResponseCode();
    System.out.println("DEBUG: responseCode = " + responseCode);
    BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
    StringBuilder sb = new StringBuilder();
    String line;
    while ((line=reader.readLine()) != null) {
      sb.append(line);
    }
    System.out.println("DEBUG: responseBody: " + sb.toString());
    connection.disconnect();
  }
}

dllama-api.cpp reads all of the headers fine, it just doesn't get any request body. I've put debugging statements in dllama-api.cpp's HttpRequest.read() method and it's readHttpRequest() method.. all that's ever read is the headers, no body.

To my eye those should be the same requests. If I bind to a port with netcat (nc) and connect to that with the same code, I see the body:

$ nc -l -p 9990
POST /v1/chat/completions HTTP/1.1
Accept: application/json
Content-Type: application/json; utf-8
User-Agent: Java/17.0.2
Host: clusterllm1.local:9990
Connection: keep-alive
Content-Length: 127

{ "messages": [{"role": "user", "content": "What is 4 * 11?"}], "temperature": 0.7, "stop": ["<|eot_id|>"], "max_tokens": 128 }

dllama-api.cpp clearly works in some cases, such as that curl request. And I'd love to find out that I've screwed up something in the above code (or the hundred variants of it that I've tried).. but I fear that dllama-api.cpp's http request handling might be a bit fragile, and it would be good to have it as robust as possible.

I'd normally never mention this in an issue; instead I'd usually solve it myself and submit a pull request, but I'm stumped. I wanted to make sure to record it here so the ball doesn't get dropped and we miss a chance to see an existing bug in dllama-api.cpp.

@b4rtaz
Copy link
Owner

b4rtaz commented Jan 28, 2025

Hello @jkeegan,

maybe this fragment is causing the problem

        std::vector<char> peekBuffer(bytesRead);
        bytesRead = recv(serverSocket, peekBuffer.data(), bytesRead, 0);
        if (bytesRead <= 0) {

So maybe the Java client sends the header as one chunk and then sends the body as the second chunk. This implementation is naive, so when the data stops coming, it assumes the request has been fully sent.

You can validate this theory by creating a raw socket in Java and sending an HTTP request with full control over the chunks.

@jkeegan
Copy link
Contributor Author

jkeegan commented Jan 30, 2025

Damn I wish I'd seen your comment, I just came to the same conclusion and have a fix working. Cleaning it up now and I'll send a pull request sometime tomorrow.

@jkeegan
Copy link
Contributor Author

jkeegan commented Feb 5, 2025

Fixed it. Issued a pull request, all three checks passed.

#155

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants