Skip to content

Deephaven csv as default #1057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Feb 17, 2025
Merged

Deephaven csv as default #1057

merged 21 commits into from
Feb 17, 2025

Conversation

Jolanrensen
Copy link
Collaborator

@Jolanrensen Jolanrensen commented Feb 11, 2025

Fixes #1047

Works to finish #827

  • Includes "dataframe-csv" by default in all modules and jupyter integration
  • Uses new csv reader by default in DataFrame.read() functions
  • Uses new csv reader by default in Gradle- and KSP plugin (this required a bootstrap bump)
  • Deprecates the old csv functions with level WARNING
    • Deprecated readDelim with level HIDDEN because of the name clash
    • Removed readDelim compiler plugin test. @Import is gonna be experimental anyway. The only reason the old csv integration still has compiler plugin support is to use it as example for the new csv integration later.
  • Replaced tests/example usage of the old functions with the new ones
  • Added testImplementation(project(":dataframe-csv")) to :core until we split off jupyter and merge sample.api tests
  • Updated KDocs
  • Updated docs for the website, examples and Deephaven specific stuff
  • Fixed Transformation into data class corrupts TimeZones of Instant #1047 by disabling the datetime parser of Deephaven by default. Added guide on the website for users who need it.
  • Moved Compression to :core so it can be reused by other io modules when needed

@Jolanrensen Jolanrensen force-pushed the deephaven-csv-default branch from 50ce472 to 4004b65 Compare February 12, 2025 12:07
@Jolanrensen Jolanrensen force-pushed the deephaven-csv-default branch 7 times, most recently from 3496cb7 to d6fe324 Compare February 13, 2025 12:20
@Jolanrensen Jolanrensen force-pushed the deephaven-csv-default branch from 8b8bb80 to 9a87d0e Compare February 13, 2025 14:11
@Jolanrensen Jolanrensen force-pushed the deephaven-csv-default branch from 79d771d to b24db62 Compare February 14, 2025 11:35
@Jolanrensen Jolanrensen marked this pull request as ready for review February 14, 2025 11:52
@Jolanrensen Jolanrensen requested a review from zaleslaw February 14, 2025 11:52
@Jolanrensen Jolanrensen force-pushed the deephaven-csv-default branch from 43f191d to 6055995 Compare February 14, 2025 15:13

override fun acceptsExtension(ext: String): Boolean = ext == "csv"

override fun acceptsSample(sample: SupportedFormatSample): Boolean = true // Extension is enough

// if the user adds the dataframe-csv module, this will override old CSV reading method in DataFrame.read()
override val testOrder: Int = CSV().testOrder - 1
override val testOrder: Int = 20_000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test order for guess.kt, so the order in which DataFrame.read() tries the supported formats. CSV used to have 20,000, so now the new one gets the same order number.

* They'll hopefully be faster and better._
*
* @param file The file to read. Can also be compressed as `.gz` or `.zip`, see [Compression][org.jetbrains.kotlinx.dataframe.io.Compression].
* @param file The file to read. Can also be compressed as `.gz` or `.zip`, see [Compression].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it navigable correctly for now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may not be, good catch! KoDEx is not good at redirecting references/paths outside the current module.

).convert("col2").toStr()

val str = StringWriter()
df.writeCSV(str)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, where all these tests (with this logic) was moved or adopted for the Deephaven csv?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exactly, it's in DelimCsvTsvTests in dataframe-csv


internal const val WRITE_CSV =
"The writeCSV() functions are deprecated in favor of writeCsv() in dataframe-csv. $MESSAGE_0_17"
internal const val WRITE_CSV_IMPORT = "org.jetbrains.kotlinx.dataframe.io.writeCsv"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great job!

charset,
)
}
): DataFrame<*> = error(DF_READ_NO_CSV)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it true, that it will be not possible to read from remote endpoint the csv with new parser? Or am I wrong?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you still can, no problem. By URL, or String. It's tested in the dataframe-csv module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is DataFrame.read() that weird overload that we deprecated already some time ago.

@Jolanrensen Jolanrensen force-pushed the deephaven-csv-default branch 2 times, most recently from 162f248 to d32cbb9 Compare February 17, 2025 15:58
@Jolanrensen Jolanrensen force-pushed the deephaven-csv-default branch from d32cbb9 to 1e2bb06 Compare February 17, 2025 16:45
@Jolanrensen Jolanrensen merged commit 9c55263 into master Feb 17, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Transformation into data class corrupts TimeZones of Instant
2 participants