Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Tokenization and Classification of Images and Tables #7

Open
Harsss opened this issue Apr 27, 2023 · 0 comments
Open

Issue with Tokenization and Classification of Images and Tables #7

Harsss opened this issue Apr 27, 2023 · 0 comments

Comments

@Harsss
Copy link

Harsss commented Apr 27, 2023

I am currently fine-tuning the LILT model on my dataset, which includes labels for various components such as headings, subheadings, text, tables, table headings, images, and captions. However, during tokenization, I encountered issues with images and tables. To resolve this, I assigned a random word for tokenization for all tables and images. However, after training the model, it does not classify any tables or images.

I am confused if I should switch to a different tokenizer from LayoutLMv3 or if there are other steps I can take to address this issue. Additionally, I am wondering to know if there are any other tokenizers that would be suitable for my dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant