-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Version
1.54.0
Steps to reproduce
Dependencies
pip install flask playwright
playwright install chromiumserver.py
from flask import Flask, Response
import time
app = Flask(__name__)
@app.route("/")
def index():
return """
<!DOCTYPE html>
<html>
<head><meta charset="utf-8"><title>SSE Test</title></head>
<body>
<h1>SSE Test Page</h1>
<button id="btn">Start SSE</button>
<script>
document.getElementById('btn').addEventListener('click', function() {
const evtSource = new EventSource('/sse');
evtSource.onmessage = function(event) { console.log(event.data); };
evtSource.onerror = function() { evtSource.close(); };
});
</script>
</body>
</html>
"""
@app.route("/sse")
def sse():
def generate():
messages = ["你好,这是第一条消息", "测试中文:😀🎉"]
for msg in messages:
yield f"data: {msg}\n\n".encode('utf-8')
time.sleep(0.3)
return Response(generate(), headers={
"Content-Type": "text/event-stream; charset=utf-8",
"Cache-Control": "no-cache",
})
if __name__ == "__main__":
app.run(port=5000)client.py
from playwright.sync_api import sync_playwright
def main():
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
# Method 1: route.fetch() - WORKS CORRECTLY
def handle_route(route):
response = route.fetch()
body = response.body()
print("\n[route.fetch()] - CORRECT")
print(f" Raw bytes: {body!r}")
print(f" Decoded: {body.decode('utf-8')!r}")
route.fulfill(response=response)
page.route("**/sse", handle_route)
# Method 2: response event - BUG
def on_response(response):
if "/sse" in response.url:
body = response.body()
print("\n[response.body()] - BUG")
print(f" Raw bytes: {body!r}")
page.on("response", on_response)
page.goto("http://localhost:5000")
page.click("#btn")
page.wait_for_timeout(3000)
browser.close()
if __name__ == "__main__":
main()Run
- Start the server:
python server.py - Run the client:
python client.py
Expected behavior
response.body() should return the raw UTF-8 bytes as sent by the server:
[route.fetch()] - CORRECT
Raw bytes: b'data: \xe4\xbd\xa0\xe5\xa5\xbd...'
Decoded: 'data: 你好,这是第一条消息\n\ndata: 测试中文:😀🎉\n\n'
[response.body()] - CORRECT
Raw bytes: b'data: \xe4\xbd\xa0\xe5\xa5\xbd...'
Actual behavior
response.body() returns double-encoded (mojibake) bytes:
[route.fetch()] - CORRECT
Raw bytes: b'data: \xe4\xbd\xa0\xe5\xa5\xbd...'
Decoded: 'data: 你好,这是第一条消息\n\ndata: 测试中文:😀🎉\n\n'
[response.body()] - BUG
Raw bytes: b'data: \xc3\xa4\xc2\xbd\xc2\xa0\xc3\xa5\xc2\xa5\xc2\xbd...'
This is the classic pattern of UTF-8 → Latin-1 decode → UTF-8 encode (mojibake).
The double-encoding can be verified:
correct = "你好".encode('utf-8') # b'\xe4\xbd\xa0\xe5\xa5\xbd'
mojibake = correct.decode('latin-1').encode('utf-8') # b'\xc3\xa4\xc2\xbd\xc2\xa0\xc3\xa5\xc2\xa5\xc2\xbd'The response.body() output matches the mojibake pattern exactly.
Additional context
- The browser DevTools Network tab shows the correct response
curlalso returns the correct bytes- Only
response.body()and CDPNetwork.getResponseBodyhave this issue - Tested with both Python and JavaScript bindings - same bug occurs
Root cause analysis: Likely a CDP (Chrome DevTools Protocol) issue
I tested calling CDP Network.getResponseBody directly:
| Method | Returns | Result |
|---|---|---|
route.fetch() |
bytes |
✅ Correct \xe4\xbd\xa0 |
CDP Network.getResponseBody |
str |
❌ Mojibake (already decoded incorrectly) |
response.body() |
bytes |
❌ Mojibake (derived from CDP) |
Key finding: CDP Network.getResponseBody returns a string (not bytes), and the string is already mojibake - meaning the incorrect decoding happens at the CDP layer, not in Playwright.
CDP test code (test_cdp.py):
from playwright.sync_api import sync_playwright
def test_cdp():
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
client = page.context.new_cdp_session(page)
client.send("Network.enable")
responses = {}
def on_response_received(params):
if "/sse" in params.get("response", {}).get("url", ""):
responses[params["requestId"]] = params["response"]["url"]
def on_loading_finished(params):
if params["requestId"] in responses:
result = client.send("Network.getResponseBody", {"requestId": params["requestId"]})
print(f"CDP Network.getResponseBody:")
print(f" base64Encoded: {result.get('base64Encoded')}")
print(f" body type: {type(result.get('body'))}") # <class 'str'> !!!
print(f" body: {result.get('body')[:50]!r}...") # Already mojibake
client.on("Network.responseReceived", on_response_received)
client.on("Network.loadingFinished", on_loading_finished)
page.goto("http://localhost:5000")
page.click("#btn")
page.wait_for_timeout(3000)
browser.close()
test_cdp()CDP test output:
CDP Network.getResponseBody:
base64Encoded: False
body type: <class 'str'>
body: 'data: ä½\xa0好,这是第一æ\x9d¡æ¶ˆæ\x81¯...' # Already mojibake!
JavaScript test (client.js):
const { chromium } = require('playwright');
(async () => {
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage();
await page.route('**/sse', async route => {
const res = await route.fetch();
console.log('[route.fetch()]', (await res.body()));
await route.fulfill({ response: res });
});
page.on('response', async res => {
if (res.url().includes('/sse'))
console.log('[response.body()]', (await res.body()));
});
await page.goto('http://localhost:5000');
await page.click('#btn');
await page.waitForTimeout(3000);
await browser.close();
})();JavaScript output:
[route.fetch()] <Buffer 64 61 74 61 3a 20 e4 bd a0 e5 a5 bd ...> ✅ Correct
[response.body()] <Buffer 64 61 74 61 3a 20 c3 a4 c2 bd c2 a0 ...> ❌ Mojibake
Why this is a blocking issue (no viable workaround)
While route.fetch() returns correct bytes, it cannot be used as a workaround for real-world SSE streams:
- SSE streams can last for minutes (e.g., LLM streaming responses, real-time data feeds)
route.fetch()blocks until the entire response is completeroute.fulfill()can only be called afterroute.fetch()returns- This means the browser receives no data until the stream ends (minutes later)
My use case: I'm testing an AI chat application where SSE responses stream for 2-5 minutes. I need to capture the response content for automated testing, but:
response.body()gives mojibakeroute.fetch()blocks for minutes, making the test useless
The only remaining option is to use page.expose_function() and capture data via JavaScript in the browser, which is a hacky workaround that shouldn't be necessary.
Environment
- Operating System: Windows 10 Pro (10.0.19045)
- CPU: Intel Core i5-10500 @ 3.10GHz
- Browser: Chrome 143.0.7499.170
- Python Version: 3.10.16
- Node.js Version: 18.13.0
- Other info: Tested with both Python and JavaScript bindings