-
Notifications
You must be signed in to change notification settings - Fork 0
data utils #127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
data utils #127
Conversation
97ee0a8
to
f7491f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 49 out of 53 changed files in this pull request and generated no comments.
Files not reviewed (4)
- data-utils/.gitignore: Language not supported
- data-utils/Makefile: Language not supported
- data-utils/tests/data/csv/custom_null_test.csv: Language not supported
- data-utils/tests/data/csv/decimal_test.csv: Language not supported
643d3c3
to
9738195
Compare
1403af3
to
9738195
Compare
6f98011
to
63671aa
Compare
6de7954
to
d6a7d64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just amazing; I'm impressed, to be honest. I have just a few suggestions and comments!
data-utils/Makefile
Outdated
|
||
.PHONY: wasm | ||
wasm: | ||
RUSTFLAGS="${RUSTFLAGS}" wasm-pack build --target web --out-name index ${BUILDFLAGS} --features default-wasm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use weak-ref
here : https://rustwasm.github.io/docs/wasm-bindgen/reference/weak-references.html ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, i didn't know about the finalization registry. yeah, definitely something we probably want to use for production builds. i'd like to use some zero-copy stuff for doing the actual blob download when doing parquet conversions, which hopefully won't be a problem with de-allocation because the JS keeps a reference to the internal wasm memory, but i'll get to that when i get to that
a765a12
to
abdbca6
Compare
f5f7ede
to
12ff923
Compare
Added an example repo https://github.com/aqora-io/data-utils-playground |
207ca58
to
0c93130
Compare
Hey @jpopesculian 👋 I haven’t reviewed yet sorry, but what are the next steps for this PR considering the dataset feature over at the platform? |
For this PR, its just about integrating it into the frontend and using it to convert user files into parquet before uploading |
I have test failures with
|
Should the commands I was under the impression the user journey would be like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read it a first time, and so far, the code looks very good 🎉 ! I just ran into 2 issues:
- I got the same result as @volgar1x when running tests
- I cannot compile with my mac, when I tried, I ran into this error: rust-lang/rust#57349 for the arrow-0.2.3 crate
.PHONY: wasm-build | ||
wasm-build: | ||
RUSTFLAGS="${RUSTFLAGS}" wasm-pack build --target web --out-name index --weak-refs --reference-types \ | ||
${BUILDFLAGS} --no-default-features --features default-wasm | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about using this in
Line 188 in 0c93130
wasm-pack build --target web --out-name index --release --weak-refs --reference-types --no-default-features --features default-wasm |
|
||
pub use ron::{Map, Value}; | ||
|
||
const NAIVE_DATE_TIME_FMT: StrftimeItems<'static> = StrftimeItems::new("%Y-%m-%dT%H:%M:%S"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between this and: const NAIVE_DATE_TIME_FMT: &str "%Y-%m-%dT%H:%M:%S";
and later : dt.format(NAIVE_DATE_TIME_FMT)
?
if rem != 0 { | ||
index -= 1; | ||
out[index].write(vec.split_off(vec.len() - rem).into_boxed_slice()); | ||
vec.shrink_to_fit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this function shrink_to_fit
do?
.map(|buf| vec_to_blob(buf.into_inner(), 65_536, &options)) | ||
.collect() | ||
} | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh a little piece of an old story 😆
0c93130
to
d302d3b
Compare
implements utilities for inferring schemas and converting CSVs and JSONs to parquet files
d302d3b
to
db9cc76
Compare
implements utilities for inferring schemas and converting CSVs and JSONs to parquet files