Skip to content

Currently tables.sort() does not always sort mutations into the required order #3253

@hyanwong

Description

@hyanwong

Taken from #3212 (comment)

Currently I can add and delete mutations so as to create a tree sequence in which compute_mutation_parents() fails, even after sorting (code below). Given the upcoming requirement for mutations to be sorted correctly to make a valid tree sequence, I think we need to fix tables.sort() to also sort mutations into a valid order? The tricky part is, I think, the order to use for mutations that occur on the same node at the same site. We can use time, but in the case of identical times (or if times are unknown) we can presumably use the existing mutation order?

import msprime
import tskit
import numpy as np

ts = msprime.sim_mutations(msprime.sim_ancestry(10, sequence_length=100, random_seed=1, recombination_rate=0.01), rate=1, random_seed=1)
print("Simulated", ts.num_mutations, "mutations")

# Add some random mutations, and delete some others
tables = ts.dump_tables()
tables.mutations.time = np.full_like(tables.mutations.time, tskit.UNKNOWN_TIME)
np.random.seed(10)
for s in ts.sites():
    tables.mutations.add_row(site=s.id, node=np.random.randint(ts.num_nodes), derived_state="A")
keep = np.ones(tables.mutations.num_rows, dtype=bool)
keep[0:100] = False
tables.mutations.replace_with(tables.mutations[keep])
# Zap all the parent IDs
tables.mutations.parent = np.full_like(tables.mutations.parent, tskit.NULL)
assert np.all(tables.mutations.parent == tskit.NULL)
tables.sort()
tables.build_index()
tables.compute_mutation_parents()
tables.tree_sequence()  # Fails

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions