Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sourceId is not the hostname #2

Open
BrunoBonacci opened this issue Aug 10, 2015 · 0 comments
Open

sourceId is not the hostname #2

BrunoBonacci opened this issue Aug 10, 2015 · 0 comments

Comments

@BrunoBonacci
Copy link
Member

https://github.com/samsara/hydrant/blob/master/src/hydrant/flows/samsara.clj#L8

The sourceId is the partition-key

Would you want your twitter feed to be partitioned by host machine who is sending the data? I don't think so. This would break the same partition for same source rule.

sourceId is always depending on the dataset, for a twitter feed a good sourceId would be the twitter handle of the user sending the tweet. So that all tweets from a user will be processed by the same processor.

For wikipedia edits you might want to use either the page edited or the author, etc,

The code should be changed in such way that the source advertises its partition key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant