You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 19, 2021. It is now read-only.
Disparate or weakly linked data makes up the majority of the worlds data, but we focus mainly on single source datasets or combining datasets with definite primary and foreign keys. A number of tidyverse compliant packages exist for data cleansing and transformation but not for deduplication or record linkage. The problem of record linkage is complex and well studied, but there are no tools or framework that fits nicely into a modern R workflow.
The RecordLinkage package is a brilliant package that does solve this problem, but its API is inconsistent and data structures awkward. A tidy record linkage package could build from the lessons learned from RecordLinkage, while adhering to the "tidy way of life" and integrating with other tidy tools nicely. I think a package like this could open up a lot of possibilities for researchers and practitioners to working with and combing data they never could before.
The text was updated successfully, but these errors were encountered:
Disparate or weakly linked data makes up the majority of the worlds data, but we focus mainly on single source datasets or combining datasets with definite primary and foreign keys. A number of tidyverse compliant packages exist for data cleansing and transformation but not for deduplication or record linkage. The problem of record linkage is complex and well studied, but there are no tools or framework that fits nicely into a modern R workflow.
The RecordLinkage package is a brilliant package that does solve this problem, but its API is inconsistent and data structures awkward. A tidy record linkage package could build from the lessons learned from RecordLinkage, while adhering to the "tidy way of life" and integrating with other tidy tools nicely. I think a package like this could open up a lot of possibilities for researchers and practitioners to working with and combing data they never could before.
The text was updated successfully, but these errors were encountered: