You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If, however, I run this paired-down isolated example Java code to do the request, dllama-api.cpp doesn't see the request body at all, just the headers:
import java.io.*;
import java.lang.*;
import java.net.*;
import java.text.*;
import java.util.*;
import java.util.stream.*;
public class Connect {
public static void main(String[] args) throws Exception {
String host = "clusterllm1.local";
int port = 9990;
URL endpoint = null;
try {
endpoint = new URL("http://" + host + ":" + port + "/v1/chat/completions");
} catch (MalformedURLException mfe) {
throw new RuntimeException(mfe);
}
System.out.println("About to open the connection...");
HttpURLConnection connection = (HttpURLConnection) endpoint.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Accept", "application/json");
connection.setRequestProperty("Content-Type", "application/json; utf-8");
connection.setDoOutput(true);
connection.setDoInput(true);
String requestBody = "{ \"messages\": [{\"role\": \"user\", \"content\": \"What is 4 * 11?\"}], \"temperature\": 0.7, \"stop\": [\"<|eot_id|>\"], \"max_tokens\": 128 }";
System.out.println("DEBUG: request body is:\n=====\n" + requestBody + "\n=====\n");
byte[] requestBodyBytes = requestBody.getBytes("UTF-8");
OutputStream os = connection.getOutputStream();
os.write(requestBodyBytes);
os.flush();
os.close();
int responseCode = connection.getResponseCode();
System.out.println("DEBUG: responseCode = " + responseCode);
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder sb = new StringBuilder();
String line;
while ((line=reader.readLine()) != null) {
sb.append(line);
}
System.out.println("DEBUG: responseBody: " + sb.toString());
connection.disconnect();
}
}
dllama-api.cpp reads all of the headers fine, it just doesn't get any request body. I've put debugging statements in dllama-api.cpp's HttpRequest.read() method and it's readHttpRequest() method.. all that's ever read is the headers, no body.
To my eye those should be the same requests. If I bind to a port with netcat (nc) and connect to that with the same code, I see the body:
dllama-api.cpp clearly works in some cases, such as that curl request. And I'd love to find out that I've screwed up something in the above code (or the hundred variants of it that I've tried).. but I fear that dllama-api.cpp's http request handling might be a bit fragile, and it would be good to have it as robust as possible.
I'd normally never mention this in an issue; instead I'd usually solve it myself and submit a pull request, but I'm stumped. I wanted to make sure to record it here so the ball doesn't get dropped and we miss a chance to see an existing bug in dllama-api.cpp.
The text was updated successfully, but these errors were encountered:
So maybe the Java client sends the header as one chunk and then sends the body as the second chunk. This implementation is naive, so when the data stops coming, it assumes the request has been fully sent.
You can validate this theory by creating a raw socket in Java and sending an HTTP request with full control over the chunks.
Damn I wish I'd seen your comment, I just came to the same conclusion and have a fix working. Cleaning it up now and I'll send a pull request sometime tomorrow.
Hi.. I've spent an entire day on this, so I figured I'd come mention it here.
If I do the following request of dllama-api with curl, it works fine:
If, however, I run this paired-down isolated example Java code to do the request, dllama-api.cpp doesn't see the request body at all, just the headers:
dllama-api.cpp reads all of the headers fine, it just doesn't get any request body. I've put debugging statements in dllama-api.cpp's HttpRequest.read() method and it's readHttpRequest() method.. all that's ever read is the headers, no body.
To my eye those should be the same requests. If I bind to a port with netcat (nc) and connect to that with the same code, I see the body:
dllama-api.cpp clearly works in some cases, such as that curl request. And I'd love to find out that I've screwed up something in the above code (or the hundred variants of it that I've tried).. but I fear that dllama-api.cpp's http request handling might be a bit fragile, and it would be good to have it as robust as possible.
I'd normally never mention this in an issue; instead I'd usually solve it myself and submit a pull request, but I'm stumped. I wanted to make sure to record it here so the ball doesn't get dropped and we miss a chance to see an existing bug in dllama-api.cpp.
The text was updated successfully, but these errors were encountered: