Duplicates in view shipping.metadata_for_augur_build_v3 #47

kairstenfay · 2020-02-20T17:57:36Z

The count of rows in metadata_for_augur_build_v3 is greater than the count of select count(*) from warehouse.sample where identifier is not null. As of now, they're and 34557 and 34468, respectively.

The numbers are the same if you do

select count(distinct(strain)) from shipping.metadata_for_augur_build_v3

There should probably not be duplicates for this view. The following join likely introduces duplicates:

left join shipping.incidence_model_observation_v2 on sample.identifier = incidence_model_observation_v2.sample

(See original Slack conversation for context)

The text was updated successfully, but these errors were encountered:

joverlee521 · 2020-02-20T19:02:51Z

Did some more digging into this.
The incidence model observation views seem to have duplicates for encounters that are linked to multiple locations (i.e. both residence and lodging locations) due to this bit:

select encounter_id, hierarchy->'tract' as residence_census_tract
from warehouse.encounter_location
left join warehouse.location using (location_id)
where relation = 'residence'
or relation = 'lodging'

tsibley · 2020-02-25T00:45:33Z

Ah, nice digging. I think the appropriate thing is to be preferring residences but falling back to lodging, so conceptually a coalesce on it (but could be a reducing aggregation in practice).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicates in view shipping.metadata_for_augur_build_v3 #47

Duplicates in view shipping.metadata_for_augur_build_v3 #47

kairstenfay commented Feb 20, 2020

joverlee521 commented Feb 20, 2020

tsibley commented Feb 25, 2020

Duplicates in view shipping.metadata_for_augur_build_v3 #47

Duplicates in view shipping.metadata_for_augur_build_v3 #47

Comments

kairstenfay commented Feb 20, 2020

joverlee521 commented Feb 20, 2020

tsibley commented Feb 25, 2020