Skip to content

MLE-18060 : Adding the new commons-csv library and fixing the CSVparser initialization #517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

abika5
Copy link
Contributor

@abika5 abika5 commented May 30, 2025

No description provided.

@abika5 abika5 marked this pull request as draft May 30, 2025 01:40
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates the newer Apache Commons CSV library and updates how CSVParser is initialized and used to correctly handle byte offsets and use the builder API.

  • Switched from deprecated CSVParser constructor to CSVParser.builder() and enabled byte tracking.
  • Updated record position checks from getCharacterByte() to getBytePosition().
  • Added Commons CSV to the build and distribution configurations; bumped Commons IO version.

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

File Description
src/main/java/com/marklogic/contentpump/SplitDelimitedTextReader.java Replaced deprecated parser instantiation and position API; updated exception message
src/main/java/com/marklogic/contentpump/DelimitedTextReader.java Capitalized exception message literal for consistency
src/assemble/bindist.xml Included org.apache.commons:commons-csv in the distribution bundle
pom.xml Removed custom version property, added direct commons-csv dependency, bumped commons-io version
Comments suppressed due to low confidence (2)

pom.xml:24

  • [nitpick] Hardcoding the Commons CSV version later in the POM duplicates version information. It may be clearer to reintroduce a <commonsCsvVersion> property to centralize version management.
<!-- <commonsCsvVersion>1.5.2-marklogic</commonsCsvVersion> removed -->

src/main/java/com/marklogic/contentpump/SplitDelimitedTextReader.java:195

  • New parser builder logic with trackBytes(true) introduces byte-offset behavior; consider adding or updating unit tests to verify correct split boundary handling and byte tracking.
        parser = CSVParser.builder()

@abika5 abika5 marked this pull request as ready for review May 30, 2025 02:55
@abika5
Copy link
Contributor Author

abika5 commented May 30, 2025

I have run 06mlcp in the local VM and it works fine. ( except 6 expected local failures)

@marklogic marklogic deleted a comment from Copilot AI May 30, 2025
@abika5 abika5 requested a review from yunzvanessa May 30, 2025 22:41
@abika5 abika5 requested a review from yunzvanessa June 2, 2025 07:21
@abika5 abika5 requested a review from yunzvanessa June 2, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants