Skip to content

Conversation

@dependabot
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Aug 14, 2025

Bumps unstructured from 0.10.27 to 0.18.13.

Release notes

Sourced from unstructured's releases.

0.18.13

Fixes

Parse a wider variety of date formats in email headers The partition_email function is now more robust to non-standard date formats, including ISO-8601 dates with "Z" suffixes. This prevents ValueError exceptions when partitioning emails with these date formats.

0.18.12

What's Changed

  • Prevent large file content in encoding exceptions Replace UnicodeDecodeError with UnprocessableEntityError in encoding detection to avoid storing entire file content in exception objects, which can cause issues in logging and error reporting systems when processing large files.

Full Changelog: Unstructured-IO/unstructured@0.18.11...0.18.12

0.18.11

What's Changed

Full Changelog: Unstructured-IO/unstructured@0.18.10...0.18.11

0.18.10

Enhancements

Features

  • Add OCR_AGENT_CACHE_SIZE environment variable Added configurable cache size for OCR agents to control memory usage.

0.18.9

Enhancements

Features

  • Convert elements to markdown for output Added function to convert elements to markdown format for easy viewing.

Fixes

  • Language detection nit Handle empty text
  • Properly handle password protected xlsx - detect password protection on XLSX files and raise appropriate

0.18.7

Enhancements

Features

  • Add language detection for PDFs Add document and element level language detection to PDFs.

Fixes

0.18.6

... (truncated)

Changelog

Sourced from unstructured's changelog.

0.18.13

Enhancements

Features

Fixes

  • Parse a wider variety of date formats in email headers The partition_email function is now more robust to non-standard date formats, including ISO-8601 dates with "Z" suffixes. This prevents ValueError exceptions when partitioning emails with these date formats.

0.18.12

Enhancements

Features

Fixes

  • Prevent large file content in encoding exceptions Replace UnicodeDecodeError with UnprocessableEntityError in encoding detection to avoid storing entire file content in exception objects, which can cause issues in logging and error reporting systems when processing large files.

0.18.11

Enhancements

  • Standardized on charset-normalizer library for encoding detection Previously we had both chardet and charset-normalizer as dependencies. We are dropping chardet and only using charset-normalizer.

Features

  • Type-aware <input> mapping in HTML transformations Bare <input> elements are now classified by their type attribute (checkbox → Checkbox, radio → RadioButton, others → FormFieldValue).

Fixes

  • Recognize '|' as a delimiter csv parser will now recognize '|' as a delimiter in addition to ',' and ';'.

0.18.10

Enhancements

  • Updated CodeQL Updated CodeQL GHA to v3 from deprecated v2.

Features

  • Add OCR_AGENT_CACHE_SIZE environment variable Added configurable cache size for OCR agents to control memory usage.

Fixes

0.18.9

Enhancements

Features

  • Convert elements to markdown for output Added function to convert elements to markdown format for easy viewing.

Fixes

  • Language detection nit* Handle empty text

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [unstructured](https://github.com/Unstructured-IO/unstructured) from 0.10.27 to 0.18.13.
- [Release notes](https://github.com/Unstructured-IO/unstructured/releases)
- [Changelog](https://github.com/Unstructured-IO/unstructured/blob/main/CHANGELOG.md)
- [Commits](Unstructured-IO/unstructured@0.10.27...0.18.13)

---
updated-dependencies:
- dependency-name: unstructured
  dependency-version: 0.18.13
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added the chore label Aug 14, 2025
@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Aug 14, 2025
@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@dependabot/pip/unstructured-0.18.13#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch dependabot/pip/unstructured-0.18.13

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@dependabot @github
Copy link
Contributor Author

dependabot bot commented on behalf of github Aug 27, 2025

Superseded by #728.

@dependabot dependabot bot closed this Aug 27, 2025
@dependabot dependabot bot deleted the dependabot/pip/unstructured-0.18.13 branch August 27, 2025 05:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant