-
Notifications
You must be signed in to change notification settings - Fork 153
v1.25.0 introduces range request errors loading file attachments on Observable #1366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks a lot for the report. I will look into debugging this, since it might be a general problem, and if a general solution might cause problem elsewhere introduce the possibility to opt-in to previous behaviour. That commit does indeed looks might be at fault, logic was intended to avoid range queries being degraded to full downloads. Could you check if with current observable deployment range queries are actually used or for those files it will eventually fall back to full reads? Either way it might be useful to understand how to reach the ideal outcome (this working using range reads) |
Thanks so much for looking into it! Here’s a current deployed working example on Observable using duckdb-wasm 1.24.0: https://observablehq.com/@observablehq/duckdb I honestly don’t even really know what range queries are! But I do see that for every file successfully loaded, there’s also this failed request with a ![]() And if I load that in Safari, for every file I see “falling back to full HTTP read” coming from this line of openFile: ![]() I tried enabling allowFullHttpReads in https://observablehq.com/d/5f93918d02f8c92c, but it doesn’t seem to have any effect. But that might be my problem with how I’m passing it. (Our files are served by S3, but we are using duckdb.DuckDBDataProtocol.HTTP, not duckdb.DuckDBDataProtocol.S3. I don’t know if that’s relevant or if we should try to change that.) |
This is an interesting case, fix is not completely clear to me, there are likely 2 problems:
We also need to add documentation on range-requests, even just a collection of useful link plus state duckdb-wasm expectations. But the problem is that when falling back to NOT using range requests, stuff will be bound by networking that will be problematic with bigger files, so ideally range-requests should work in most cases with servers defaults (whatever they are). |
hey @carlopi – is there any update on this, or a workaround that you might recommend? we are unfortunately still blocked by this on upgrading our version of DuckDB. i'd be happy to screenshare or share more data if that would help you to debug! |
Hi @annie, I went a bit longer on this, and I still think there is some problem in the way either your S3 bucket is set up AND/OR in the way your CloudFront or equivalent CDN is set up. Using curl, asking for a range between 0th and 10th byte, I get a file that is correctly of length 11.
Doing the same on one of the file that you serve:
Note that not advertising compression (=means the serve should send avoid compression) the file has the right dimension
This is a mock of what happens under the hood of a web browser request, where the browser advertise its capabilities to decompress data, I think that the setting on the S3 bucket / CDN combination makes so that you send back range request marked as "I am compressed" while sending UNCOMPRESSED data (see the 11 bytes). Then the browsers sees 11 bytes, sees it needs to uncompress then, it do so and fail (given data is not really in gzip format, and here different browsers are probably compliant in answering in different ways to this error situation). I think it would be ideal to solve this on your side, we can consider adding a setting on avoiding range-requests entirely, but then performance on observable would be worse and I am not sure the added complexity will have much value. |
I just got DuckDb wasm running, From my Chrome dev log: HEAD request with range header failed: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'blob:http://localhost:5173/a5dd5d8f-1b74-4907-b4a6-07483af0877f': 'GET' is the only method allowed for 'blob:' URLs. falling back to full HTTP read for: blob:http://localhost:5173/a5dd5d8f-1b74-4907-b4a6-07483af0877f |
What happens?
Yesterday we updated duckdb-wasm in the Observable standard library from 1.24.0 to 1.27.0, but then found that it could no longer read some file attachments, especially larger CSVs, in Safari and Firefox; we get range request errors. We have rolled back the upgrade for now, but would love to re-deploy it. (This person, for example, is excited about UNPIVOT!)
This commit 53e4aad, refactoring openFile, caught my eye. In the reproduction notebook below, I tried passing allowFullHttpReads, but it didn’t seem to help. (But I might be doing it wrong!)
Happy to pair on this if that would help.
To Reproduce
This notebook ports the DuckDBClient from stdlib to a notebook for testing, and reproduces the error in Safari and Firefox: https://observablehq.com/d/5f93918d02f8c92c
Browser/Environment:
Safari 16.3, Firefox 116.0.2
Device:
MacBook Pro (M1, 2021) on macOS Monterey (12.6.3)
DuckDB-Wasm Version:
1.25.0 and up
DuckDB-Wasm Deployment:
https://observablehq.com/
Full Name:
Christopher “Toph” Blair Tucker
Affiliation:
Observable, Inc.
The text was updated successfully, but these errors were encountered: