Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite mystery rows parsed in Excel parsing #5294

Open
lukasmueller opened this issue Feb 3, 2025 · 2 comments
Open

Infinite mystery rows parsed in Excel parsing #5294

lukasmueller opened this issue Feb 3, 2025 · 2 comments
Assignees
Labels
Priority: High Issue/PR significantly impacts users. Type: Bug Issue describes a bug.

Comments

@lukasmueller
Copy link
Member

This has been observed for the accession upload but may also be happening in other uploads: Sometimes the Excel parser cannot determine the end of the "data zone" in the Excel file and continues parsing to the end of the Excel file, parsing 100,000s of rows and of course timing out in the process. We need to manually check the end of the parsing, such as checking for empty values.

It is not clear what causes this to happen, I think it is when columns or copied around potentially it will set the "data zone" to the entire column length.

@afpowell
Copy link
Contributor

afpowell commented Feb 3, 2025

@lukasmueller - Do you have a file that causes this?

@lukasmueller lukasmueller added Type: Bug Issue describes a bug. Priority: High Issue/PR significantly impacts users. labels Feb 10, 2025
@dwaring87
Copy link
Member

I did check the example file from Lukas and it does have a lot of "blank" rows (rows with no data, but are still included in the Excel file, likely because they have formatting information). I don't think it is the file parser that is causing the slow upload - it will stop parsing the file after 5 empty rows. The slow part seems to be the Fuzzy Search when uploading accessions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Issue/PR significantly impacts users. Type: Bug Issue describes a bug.
Projects
None yet
Development

No branches or pull requests

3 participants