Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV: Support for Map and Nested fields #211

Open
DALDEI opened this issue Aug 2, 2020 · 0 comments
Open

CSV: Support for Map and Nested fields #211

DALDEI opened this issue Aug 2, 2020 · 0 comments
Labels
csv to-evaluate Issue that has been received but not yet evaluated

Comments

@DALDEI
Copy link

DALDEI commented Aug 2, 2020

Use Case: reading/writing 'complex' CSV files interoperable with Athena / Presto / Hive

Example: a 'serde' for CSV from Hive (or athena or presto ) can support the following attributes

row format delimited
fields terminated by ','
collection items terminated by '|'
map keys terminated by '#'

In addition to the above is a built in (not customizable) support for nested collections which are deliminated by 'level' with a different delimiter per level -- enabling representing [ 1 , [2,3, { "a": [ 4 ] ]

Currently the array element separator provides part of this
A useful addition would a 'map' seperator that would work like the arrays but create json Objects.

example:

FIELDS=,
ARRAY=|
MAP=#

1,2|3|4,key#value|key2#value2|key3#value3,four

could map to the java class

class Complex { 
   int col1;
   int col2[];
   innerclass col3 ;
   String col4;
}

class innerclass {
   String key ;
   String  key2;
   String key3;
}



Nested requires 'levels' of both array and map separators.

Hive Reference
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableCreate/Drop/TruncateTable

@DALDEI DALDEI changed the title Support for Map and Nested fields CSV: Support for Map and Nested fields Aug 2, 2020
@cowtowncoder cowtowncoder added csv to-evaluate Issue that has been received but not yet evaluated labels Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
csv to-evaluate Issue that has been received but not yet evaluated
Projects
None yet
Development

No branches or pull requests

2 participants