Skip to content

v1.25.0 introduces range request errors loading file attachments on Observable #1366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tophtucker opened this issue Aug 15, 2023 · 6 comments
Assignees

Comments

@tophtucker
Copy link

What happens?

Yesterday we updated duckdb-wasm in the Observable standard library from 1.24.0 to 1.27.0, but then found that it could no longer read some file attachments, especially larger CSVs, in Safari and Firefox; we get range request errors. We have rolled back the upgrade for now, but would love to re-deploy it. (This person, for example, is excited about UNPIVOT!)

This commit 53e4aad, refactoring openFile, caught my eye. In the reproduction notebook below, I tried passing allowFullHttpReads, but it didn’t seem to help. (But I might be doing it wrong!)

Happy to pair on this if that would help.

To Reproduce

This notebook ports the DuckDBClient from stdlib to a notebook for testing, and reproduces the error in Safari and Firefox: https://observablehq.com/d/5f93918d02f8c92c

Browser/Environment:

Safari 16.3, Firefox 116.0.2

Device:

MacBook Pro (M1, 2021) on macOS Monterey (12.6.3)

DuckDB-Wasm Version:

1.25.0 and up

DuckDB-Wasm Deployment:

https://observablehq.com/

Full Name:

Christopher “Toph” Blair Tucker

Affiliation:

Observable, Inc.

@carlopi
Copy link
Collaborator

carlopi commented Aug 15, 2023

Thanks a lot for the report.
I do reproduce the error on Safari and Firefox.

I will look into debugging this, since it might be a general problem, and if a general solution might cause problem elsewhere introduce the possibility to opt-in to previous behaviour.

That commit does indeed looks might be at fault, logic was intended to avoid range queries being degraded to full downloads. Could you check if with current observable deployment range queries are actually used or for those files it will eventually fall back to full reads? Either way it might be useful to understand how to reach the ideal outcome (this working using range reads)

@carlopi carlopi self-assigned this Aug 15, 2023
@tophtucker
Copy link
Author

Thanks so much for looking into it! Here’s a current deployed working example on Observable using duckdb-wasm 1.24.0: https://observablehq.com/@observablehq/duckdb

I honestly don’t even really know what range queries are! But I do see that for every file successfully loaded, there’s also this failed request with a Range: bytes=0- header:

image

And if I load that in Safari, for every file I see “falling back to full HTTP read” coming from this line of openFile:

image

I tried enabling allowFullHttpReads in https://observablehq.com/d/5f93918d02f8c92c, but it doesn’t seem to have any effect. But that might be my problem with how I’m passing it.

(Our files are served by S3, but we are using duckdb.DuckDBDataProtocol.HTTP, not duckdb.DuckDBDataProtocol.S3. I don’t know if that’s relevant or if we should try to change that.)

@carlopi
Copy link
Collaborator

carlopi commented Aug 16, 2023

This is an interesting case, fix is not completely clear to me, there are likely 2 problems:

  • fallback should work
  • range requests needs to work

We also need to add documentation on range-requests, even just a collection of useful link plus state duckdb-wasm expectations.
Here some basics: https://developer.mozilla.org/en-US/docs/Web/HTTP/Range_requests.

But the problem is that when falling back to NOT using range requests, stuff will be bound by networking that will be problematic with bigger files, so ideally range-requests should work in most cases with servers defaults (whatever they are).

@annie
Copy link

annie commented Oct 18, 2023

hey @carlopi – is there any update on this, or a workaround that you might recommend? we are unfortunately still blocked by this on upgrading our version of DuckDB.

i'd be happy to screenshare or share more data if that would help you to debug!

@carlopi
Copy link
Collaborator

carlopi commented Oct 18, 2023

Hi @annie, I went a bit longer on this, and I still think there is some problem in the way either your S3 bucket is set up AND/OR in the way your CloudFront or equivalent CDN is set up.

Using curl, asking for a range between 0th and 10th byte, I get a file that is correctly of length 11.

curl -r 0-10 --compressed https://media.githubusercontent.com/media/datablist/sample-csv-files/main/files/customers/customers-100.csv --output github_file
ls -la github_file
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    11  100    11    0     0     20      0 --:--:-- --:--:-- --:--:--    20

-rw-r--r--  1 carlo  staff  11 Oct 19 00:01 github_file

Doing the same on one of the file that you serve:

curl -r 0-10 --compressed https://static.observableusercontent.com/files/c56b9e232d72bf1df96ca3eeca37e29e811adb72f49d943659a0006c015e74d2c429186d9dca251060784f364eb2a16fd39584695d523588bdcb87e4d9eac650 --output observable_file
ls -la observable_file
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    11  100    11    0     0     20      0 --:--:-- --:--:-- --:--:--    20

-rw-r--r--  1 carlo  staff  0 Oct 19 00:01 observable_file

Note that not advertising compression (=means the serve should send avoid compression) the file has the right dimension

curl -r 0-10 https://static.observableusercontent.com/files/c56b9e232d72bf1df96ca3eeca37e29e811adb72f49d943659a0006c015e74d2c429186d9dca251060784f364eb2a16fd39584695d523588bdcb87e4d9eac650 --output observable_file_no_compression
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    11  100    11    0     0     20      0 --:--:-- --:--:-- --:--:--    20

ls -la observable_file
-rw-r--r--  1 carlo  staff  11 Oct 19 00:01 observable_file_no_compression

This is a mock of what happens under the hood of a web browser request, where the browser advertise its capabilities to decompress data, I think that the setting on the S3 bucket / CDN combination makes so that you send back range request marked as "I am compressed" while sending UNCOMPRESSED data (see the 11 bytes). Then the browsers sees 11 bytes, sees it needs to uncompress then, it do so and fail (given data is not really in gzip format, and here different browsers are probably compliant in answering in different ways to this error situation).

I think it would be ideal to solve this on your side, we can consider adding a setting on avoiding range-requests entirely, but then performance on observable would be worse and I am not sure the added complexity will have much value.

@cugarteblair
Copy link

I just got DuckDb wasm running,
I am seeing this also. It seems as if the fallback works but there is a performance hit with the hiccup. What is the long term plan for this?

From my Chrome dev log:

HEAD request with range header failed: NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'blob:http://localhost:5173/a5dd5d8f-1b74-4907-b4a6-07483af0877f': 'GET' is the only method allowed for 'blob:' URLs.

falling back to full HTTP read for: blob:http://localhost:5173/a5dd5d8f-1b74-4907-b4a6-07483af0877f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants