Identifying features to include in initial development release. · JuliaHealth IPUMS.jl · Discussion #7

I went through the IPUMSR package tutorials to understand what kinds of functionalities are included. I created a simple list, and we can now identify what features to include in an initial or subsequent releases.

reading IPUMS data.
IPUMs data generally comes in multiple files, such as with a DDI file and data file.
With NHGIS files, there is only one file that contains the metadata and data, though
there can be discrepancies between the metadata in the NHGIS file and the corresponding
DDI file. Further, users can often download multiple data files and would like to
process these files as a batch.

IPUMs value labels (categorical variables encoded as numbers) - based on haven() R package.
Note that often IPUMS data columns have variable labels that are human readable variables names
as opposed to esoteric column names, like household_income versus HV001_a. Data extracts may also
contain variable descriptions, which are text descriptions of the contents of a variable.
Finally, extracts may contain value labels which are categorical encodings like R factors, eg.
like 1 = Excellent, 2 = very good. The IPUMSR package uses the labelled() class from the
haven() package. The data type for a column would be say <int+lbl> to indicate that there are
equivalent forms of the data. The Julia design does not need to be identical, but there should be
a way to identify when columns have labeled/categorical data, and to identify the values of those
labels.

Ipums for big data

working with chunked data - like map-reduce
working with external database
implement lazy data structures for big data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identifying features to include in initial development release. #7

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Identifying features to include in initial development release. #7

00krishna Mar 30, 2024 Maintainer

Replies: 0 comments

00krishna
Mar 30, 2024
Maintainer