-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Open
Labels
ingestionRelated to manual data ingestion with the prepdocs scriptsRelated to manual data ingestion with the prepdocs scripts
Description
Please provide us with the following information:
This issue is for a: (mark with an x)
- [x ] bug report -> please search issues before submitting
- [ x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
I included 1000s of files with MD file format and images to the data/ folder.
azd upkept failing for every single file which has a smaller dimension. It is ok to skip those file but currently I've to delete those files to proceed withazd up.
Any log messages given by the failure
For files with very small dimensions (un-supported), the
azd upkept failing and had to fix each of the files to proceed.
Ingesting 'efficiency.png'
Extracting text from './data/efficiency.png' using Azure Document Intelligence
Traceback (most recent call last):
File "/workspaces/azure-search-openai-demo/./app/backend/prepdocs.py", line 470, in <module>
loop.run_until_complete(main(ingestion_strategy, setup_index=not args.remove and not args.removeall))
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/workspaces/azure-search-openai-demo/./app/backend/prepdocs.py", line 213, in main
await strategy.run()
File "/workspaces/azure-search-openai-demo/app/backend/prepdocslib/filestrategy.py", line 84, in run
sections = await parse_file(file, self.file_processors, self.category, self.image_embeddings)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/azure-search-openai-demo/app/backend/prepdocslib/filestrategy.py", line 26, in parse_file
pages = [page async for page in processor.parser.parse(content=file.content)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/azure-search-openai-demo/app/backend/prepdocslib/filestrategy.py", line 26, in <listcomp>
pages = [page async for page in processor.parser.parse(content=file.content)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/azure-search-openai-demo/app/backend/prepdocslib/pdfparser.py", line 54, in parse
poller = await document_intelligence_client.begin_analyze_document(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/azure-search-openai-demo/.venv/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 94, in wrapper_use_tracer
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/azure-search-openai-demo/.venv/lib/python3.11/site-packages/azure/ai/documentintelligence/aio/_operations/_operations.py", line 3241, in begin_analyze_document
raw_result = await self._analyze_document_initial( # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/azure-search-openai-demo/.venv/lib/python3.11/site-packages/azure/ai/documentintelligence/aio/_operations/_operations.py", line 132, in _analyze_document_initial
raise HttpResponseError(response=response, model=error)
azure.core.exceptions.HttpResponseError: (InvalidRequest) Invalid request.
Code: InvalidRequest
Message: Invalid request.
Inner error: {
"code": "InvalidContentDimensions",
"message": "The input image dimensions are out of range. Refer to documentation for supported image dimensions."
}
ERROR: error executing step command 'provision': failed running post hooks: 'postprovision' hook failed with exit code: '1', Path: '/tmp/azd-postprovision-1671022109.sh'. : exit code: 1
Expected/desired behavior
Skip the files which has InvalidContentDimensions and provide a summary of the files skipped later.
OS and Version?
GitHub Codespaces
uname -ra
Linux codespaces-50920e 6.5.0-1021-azure #22~22.04.1-Ubuntu SMP Tue Apr 30 16:08:18 UTC 2024 x86_64 GNU/Linux
azd version?
run
azd versionand copy paste here.
azd version 1.9.3 (commit e1624330dcc7dde440ecc1eda06aac40e68aa0a3)
Versions
Mention any other details that might be useful
Thanks! We'll be in touch soon.
Metadata
Metadata
Assignees
Labels
ingestionRelated to manual data ingestion with the prepdocs scriptsRelated to manual data ingestion with the prepdocs scripts