-
Notifications
You must be signed in to change notification settings - Fork 77
Open
Description
Taken from #3212 (comment)
Currently I can add and delete mutations so as to create a tree sequence in which compute_mutation_parents()
fails, even after sorting (code below). Given the upcoming requirement for mutations to be sorted correctly to make a valid tree sequence, I think we need to fix tables.sort()
to also sort mutations into a valid order? The tricky part is, I think, the order to use for mutations that occur on the same node at the same site. We can use time, but in the case of identical times (or if times are unknown) we can presumably use the existing mutation order?
import msprime
import tskit
import numpy as np
ts = msprime.sim_mutations(msprime.sim_ancestry(10, sequence_length=100, random_seed=1, recombination_rate=0.01), rate=1, random_seed=1)
print("Simulated", ts.num_mutations, "mutations")
# Add some random mutations, and delete some others
tables = ts.dump_tables()
tables.mutations.time = np.full_like(tables.mutations.time, tskit.UNKNOWN_TIME)
np.random.seed(10)
for s in ts.sites():
tables.mutations.add_row(site=s.id, node=np.random.randint(ts.num_nodes), derived_state="A")
keep = np.ones(tables.mutations.num_rows, dtype=bool)
keep[0:100] = False
tables.mutations.replace_with(tables.mutations[keep])
# Zap all the parent IDs
tables.mutations.parent = np.full_like(tables.mutations.parent, tskit.NULL)
assert np.all(tables.mutations.parent == tskit.NULL)
tables.sort()
tables.build_index()
tables.compute_mutation_parents()
tables.tree_sequence() # Fails
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working