Skip to content
This repository has been archived by the owner on May 19, 2021. It is now read-only.

Tidy record linkage package #98

Open
1danjordan opened this issue Aug 9, 2017 · 1 comment
Open

Tidy record linkage package #98

1danjordan opened this issue Aug 9, 2017 · 1 comment

Comments

@1danjordan
Copy link

Disparate or weakly linked data makes up the majority of the worlds data, but we focus mainly on single source datasets or combining datasets with definite primary and foreign keys. A number of tidyverse compliant packages exist for data cleansing and transformation but not for deduplication or record linkage. The problem of record linkage is complex and well studied, but there are no tools or framework that fits nicely into a modern R workflow.

The RecordLinkage package is a brilliant package that does solve this problem, but its API is inconsistent and data structures awkward. A tidy record linkage package could build from the lessons learned from RecordLinkage, while adhering to the "tidy way of life" and integrating with other tidy tools nicely. I think a package like this could open up a lot of possibilities for researchers and practitioners to working with and combing data they never could before.

@ck37
Copy link

ck37 commented Sep 11, 2017

How do you feel about the fastLink package?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants