Skip to content

Azure Document Intelligence Error: InvalidContentDimensions #1735

@anoobbacker

Description

@anoobbacker

Please provide us with the following information:

This issue is for a: (mark with an x)

- [x ] bug report -> please search issues before submitting
- [ x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

I included 1000s of files with MD file format and images to the data/ folder. azd up kept failing for every single file which has a smaller dimension. It is ok to skip those file but currently I've to delete those files to proceed with azd up.

Any log messages given by the failure

For files with very small dimensions (un-supported), the azd up kept failing and had to fix each of the files to proceed.

Ingesting 'efficiency.png'
Extracting text from './data/efficiency.png' using Azure Document Intelligence
Traceback (most recent call last):
  File "/workspaces/azure-search-openai-demo/./app/backend/prepdocs.py", line 470, in <module>
    loop.run_until_complete(main(ingestion_strategy, setup_index=not args.remove and not args.removeall))
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/./app/backend/prepdocs.py", line 213, in main
    await strategy.run()
  File "/workspaces/azure-search-openai-demo/app/backend/prepdocslib/filestrategy.py", line 84, in run
    sections = await parse_file(file, self.file_processors, self.category, self.image_embeddings)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/app/backend/prepdocslib/filestrategy.py", line 26, in parse_file
    pages = [page async for page in processor.parser.parse(content=file.content)]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/app/backend/prepdocslib/filestrategy.py", line 26, in <listcomp>
    pages = [page async for page in processor.parser.parse(content=file.content)]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/app/backend/prepdocslib/pdfparser.py", line 54, in parse
    poller = await document_intelligence_client.begin_analyze_document(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/.venv/lib/python3.11/site-packages/azure/core/tracing/decorator_async.py", line 94, in wrapper_use_tracer
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/.venv/lib/python3.11/site-packages/azure/ai/documentintelligence/aio/_operations/_operations.py", line 3241, in begin_analyze_document
    raw_result = await self._analyze_document_initial(  # type: ignore
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspaces/azure-search-openai-demo/.venv/lib/python3.11/site-packages/azure/ai/documentintelligence/aio/_operations/_operations.py", line 132, in _analyze_document_initial
    raise HttpResponseError(response=response, model=error)
azure.core.exceptions.HttpResponseError: (InvalidRequest) Invalid request.
Code: InvalidRequest
Message: Invalid request.
Inner error: {
    "code": "InvalidContentDimensions",
    "message": "The input image dimensions are out of range. Refer to documentation for supported image dimensions."
}

ERROR: error executing step command 'provision': failed running post hooks: 'postprovision' hook failed with exit code: '1', Path: '/tmp/azd-postprovision-1671022109.sh'. : exit code: 1

Expected/desired behavior

Skip the files which has InvalidContentDimensions and provide a summary of the files skipped later.

OS and Version?

GitHub Codespaces

uname -ra
Linux codespaces-50920e 6.5.0-1021-azure #22~22.04.1-Ubuntu SMP Tue Apr 30 16:08:18 UTC 2024 x86_64 GNU/Linux

azd version?

run azd version and copy paste here.

azd version 1.9.3 (commit e1624330dcc7dde440ecc1eda06aac40e68aa0a3)

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ingestionRelated to manual data ingestion with the prepdocs scripts

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions