Skip to content

API to determine download size? #79

@domenic

Description

@domenic

(This issue applies to all AI APIs, including translator, language detector, and prompt, but I'm opening it here for lack of a better place.)

In Chrome, we've had a few cases (see bug report) where developers want to use an AI model if it's a "small download" (~MiBs), but not if it's a "big download" (~GiBs). However, right now availability() only returns "downloadable" vs. "downloading"/"available"; it doesn't give the more fine-grained signal.

The specification makes this especially acute because of how it allows separate availability for every API and every combination of options. That is, it's a valid implementation to use separate LoRAs or prompts or similar, on top of a large base model, for summarizer vs. writer vs. rewriter, or even for short Japanese headline summaries vs. medium English bullet-point summaries specifically. If a browser lazily downloads these extras, it might be the case that there's a small download for the extra remaining, so availability() returns "downloadable", even though the mutli-GiB base model is already downloaded.

Chrome's implementation, in fact, currently has this problem: it has small (~KiBs) "model configs" for each API, so that even if the multi-GiB Gemini Nano model is already downloaded because someone previously used the summarizer API, the writer API will still return "downloadable". We plan to fix this in Chrome by just optimistically downloading all model configs, which is better for privacy anyway.

However, this sort of problem might crop up in other cases. And, it might just independently be useful to give this signal to developers. For example, it's currently the case that Chrome's language detection model is ~MiBs, whereas the writing assistance APIs rely on ~GiBs of models. If developers adopt a uniform strategy of just not bothering to use the API when it's "downloadable", this might be the correct strategy for writing assistance APIs, but a subpar strategy for language detection.

Another, more speculative example is that currently browsers are using ~10-100 MiB-scale custom models for translation. But in the future, they might want to use ~GiB scale LLMs, especially if they could use the same one to back translation and writing assistance, or if they want to implement some more advanced features like webmachinelearning/translation-api#9. So again, the pseudo-binary nature of the availability signal could lead developers to assume the wrong thing.


My suggestion for how we might solve this is a third static method, alongside create() and availability(), such as approximateDownloadSize(). (approxDownloadSize()?)

  • We still want to censor the exact size of the model, so we'd want to come up with some rounding scheme. Maybe nearest power of ten, or nearest power of 2, or rounding to specific thresholds like 1 MiB, 128 MiB, 1 GiB, 10 GiB.

  • It would return null if availability() is not "downloadable" or "downloading".

  • If availability() is "downloading", it would return the amount left, instead of the total amount. (Since that's what's important for sites which are deciding whether or not to trigger a download, and this is consistent with the downloadprogress events.)

Note that this is not intended to help sites show size-accurate progress bars, e.g. X MiB/Y MiB downloaded. I don't think we consider that a goal of this API. If we think signaling download sizes to the user is important, then that should be done by browser UI, not by individual sites. Individual sites can use downloadprogress to get a percentage and a sense of velocity, which they might reflect into their UI, but it's not an API goal to let sites recreate browser download UIs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions