Skip to content

CsvSource type conversion with custom schema #370

@tszolar

Description

@tszolar

From the project README - CSV source part I got the idea that type conversion for loaded CSV should be performed according to the specified schema.

But if I define a custom schema for a CsvSource which has columns with other types than String (Int for example), then the values in that column are still returned as String.

Is it intended behaviour, bug or it just haven't been implemented?

Runnable example:

import java.io.ByteArrayInputStream
import java.nio.charset.StandardCharsets
import io.eels.component.csv.CsvSource
import io.eels.schema._

object CsvSourceTypeConversionTest extends App {

  val exampleCsvString =
    """A,B,C,D
      |1,2.2,3,foo
      |4,5.5,6,bar
    """.stripMargin

  val stream = new ByteArrayInputStream(exampleCsvString.getBytes(StandardCharsets.UTF_8))
  val schema = new StructType(Vector(
    Field("A", IntType.Signed),
    Field("B", DoubleType),
    Field("C", IntType.Signed),
    Field("D", StringType)
  ))
  val ds = new CsvSource(stream _, Some(schema)).toDataStream()
  val firstRow = ds.iterator.toIterable.head
  val firstRowA = firstRow.get("A")
  println(firstRowA) // prints 1 as expected
  println(firstRowA.getClass.getTypeName) // prints java.lang.String
  assert(firstRowA == 1) // this assertion will fail because firstRowA is not an Int
}

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions