Description
When running in a Node.js environment, it looks like there's an edge case where the unavailability of a caching system causes an error when loading model files. The problematic code:
// https://github.com/huggingface/transformers.js/blob/10c09fb561857abf817d651a02898f590f5b7954/src/utils/hub.js#L650
const path = await cache.match(cacheKey);
if (path instanceof FileResponse) {
return path.filePath;
}
Specifically, in a Node.js environment, getModelFile()
attempts to download the respective model.onnx (or equivalent) file, store it in a cache and return the path to it. In cases where caching is disabled or unavailable and a remote model is to be loaded, cache
stays undefined and the cache.match()
call throws an error.
Simple repro (just setting useFSCache
to be false
):
import * as transformers from '@huggingface/transformers';
transformers.env.useFSCache = false;
transformers.pipeline("sentiment-analysis").then(async pipe => {
console.log( await pipe('Doesn\'t work when cache is unavailable!') )
})
raises
TypeError: Cannot read properties of undefined (reading 'match')
.
File system availability for caching might be an okay assumption to make here, but I feel like it should still be checked for. The function does make allowances for it in most places, it's just this last piece that seems to miss it.
As a possible suggestion, the response
variable already contains the downloaded model file, so in cases where caching is unavailable, it might make sense to just pass the buffer back. Seems like it's passed directly to ONNX.InferenceSession.create()
, which accepts both buffers and paths. Happy to raise a PR if that's acceptable.