Issue with Tokenization and Classification of Images and Tables #7

Harsss · 2023-04-27T06:40:07Z

I am currently fine-tuning the LILT model on my dataset, which includes labels for various components such as headings, subheadings, text, tables, table headings, images, and captions. However, during tokenization, I encountered issues with images and tables. To resolve this, I assigned a random word for tokenization for all tables and images. However, after training the model, it does not classify any tables or images.

I am confused if I should switch to a different tokenizer from LayoutLMv3 or if there are other steps I can take to address this issue. Additionally, I am wondering to know if there are any other tokenizers that would be suitable for my dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Tokenization and Classification of Images and Tables #7

Issue with Tokenization and Classification of Images and Tables #7

Harsss commented Apr 27, 2023

Issue with Tokenization and Classification of Images and Tables #7

Issue with Tokenization and Classification of Images and Tables #7

Comments

Harsss commented Apr 27, 2023