Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] pdf2paruet fails on windows due to fnctl #969

Open
1 of 2 tasks
touma-I opened this issue Jan 24, 2025 · 1 comment
Open
1 of 2 tasks

[Bug] pdf2paruet fails on windows due to fnctl #969

touma-I opened this issue Jan 24, 2025 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@touma-I
Copy link
Collaborator

touma-I commented Jan 24, 2025

Search before asking

  • I searched the issues and found no similar issues.

Component

Transforms/Other

What happened + What you expected to happen

When running the transform on windows, I get the following error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[3], line 1
----> 1 from dpk_pdf2parquet.transform_python import Pdf2Parquet
      2 from dpk_pdf2parquet.transform import pdf2parquet_contents_types

File ~\AppData\Roaming\Python\Python312\site-packages\dpk_pdf2parquet\__init__.py:1
----> 1 from .transform import *

File ~\AppData\Roaming\Python\Python312\site-packages\dpk_pdf2parquet\transform.py:35
     33 from data_processing.utils import TransformUtils, get_logger, str2bool
     34 from data_processing.utils.cli_utils import CLIArgumentProvider
---> 35 from data_processing.utils.multilock import MultiLock
     36 from docling.backend.docling_parse_backend import DoclingParseDocumentBackend
     37 from docling.backend.docling_parse_v2_backend import DoclingParseV2DocumentBackend

File ~\AppData\Roaming\Python\Python312\site-packages\data_processing\utils\multilock.py:15
     13 import abc
     14 import datetime
---> 15 import fcntl
     16 import os
     17 import tempfile

ModuleNotFoundError: No module named 'fcntl'

Reproduction script

  1. Clone the repo
  2. Use conda navigator to start a notebook
  3. Navigate to the folder with the following notebook: /data-prep-kit/examples/notebooks/Run_your_first_transform_colab.ipynb
  4. Run the notebook
  5. It will produce the error when it tries to import the transform

Anything else

This error was introduced when a lock mechanism was coded to address the issue when multiple ray workers are trying to create and delete the same file

OS

Windows WSL

Python

3.11.x

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@touma-I touma-I added the bug Something isn't working label Jan 24, 2025
@touma-I
Copy link
Collaborator Author

touma-I commented Jan 24, 2025

Consider enabling this functionality only for non-windows users:

if platform.system() != 'Windows':
   import fcntl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants