|
1 |
| -.. _sec_interchange: |
| 1 | +.. _sec_data_model: |
2 | 2 |
|
3 |
| -######################### |
4 |
| -Tree sequence interchange |
5 |
| -######################### |
| 3 | +########## |
| 4 | +Data model |
| 5 | +########## |
6 | 6 |
|
7 | 7 | The correlated genealogical trees that describe the shared ancestry of a set of
|
8 |
| -samples are stored concisely in ``msprime`` as a collection of |
| 8 | +samples are stored concisely in ``tskit`` as a collection of |
9 | 9 | easy-to-understand tables. These are output by coalescent simulation in
|
10 | 10 | ``msprime`` or can be read in from another source. This page documents
|
11 | 11 | the structure of the tables, and the different methods of interchanging
|
12 |
| -genealogical data to and from the msprime API. We begin by defining |
| 12 | +genealogical data to and from the tskit API. We begin by defining |
13 | 13 | the basic concepts that we need and the structure of the tables in the
|
14 | 14 | `Data model`_ section. We then describe the tabular text formats that can
|
15 | 15 | be used as simple interchange mechanism for small amounts of data in the
|
16 | 16 | `Text file formats`_ section. The `Binary interchange`_ section then describes
|
17 | 17 | the efficient Python API for table interchange using numpy arrays. Finally,
|
18 |
| -we describe the binary format used by msprime to efficiently |
| 18 | +we describe the binary format used by tskit to efficiently |
19 | 19 | store tree sequences on disk in the `Tree sequence file format`_ section.
|
20 | 20 |
|
21 | 21 |
|
22 |
| -.. _sec_data_model: |
| 22 | +.. _sec_data_model_definitions: |
23 | 23 |
|
24 |
| -********** |
25 |
| -Data model |
26 |
| -********** |
| 24 | +*********** |
| 25 | +Definitions |
| 26 | +*********** |
27 | 27 |
|
28 | 28 | To begin, here are definitions of some key ideas encountered later.
|
29 | 29 |
|
@@ -156,7 +156,7 @@ term "genome" at times, for concreteness.
|
156 | 156 | Several properties naturally associated with individuals are in fact assigned
|
157 | 157 | to nodes in what follows: birth time and population. This is for two reasons:
|
158 | 158 | First, since coalescent simulations naturally lack a notion of polyploidy, earlier
|
159 |
| -versions of ``msprime`` lacked the notion of an individual. Second, ancestral |
| 159 | +versions of ``tskit`` lacked the notion of an individual. Second, ancestral |
160 | 160 | nodes are not naturally grouped together into individuals -- we know they must have
|
161 | 161 | existed, but have no way of inferring this grouping, so in fact many nodes in
|
162 | 162 | an empirically-derived tree sequence will not be associated with individuals,
|
@@ -405,7 +405,7 @@ helpful for inferring demographic history to record this history.
|
405 | 405 | Migrations are performed by individual ancestors, but most likely not by an
|
406 | 406 | individual whose genome is tracked as a ``node`` (as in a discrete-deme model they are
|
407 | 407 | unlikely to be both a migrant and a most recent common ancestor). So,
|
408 |
| -``msprime`` records when a segment of ancestry has moved between |
| 408 | +``tskit`` records when a segment of ancestry has moved between |
409 | 409 | populations. This table is not required, even if different nodes come from
|
410 | 410 | different populations.
|
411 | 411 |
|
@@ -491,7 +491,7 @@ the library itself can use. All other information is considered to be
|
491 | 491 | tables.
|
492 | 492 |
|
493 | 493 | Arbitrary binary data can be stored in ``metadata`` columns, and the
|
494 |
| -``msprime`` library makes no attempt to interpret this information. How the |
| 494 | +``tskit`` library makes no attempt to interpret this information. How the |
495 | 495 | information held in this field is encoded is entirely the choice of client code.
|
496 | 496 |
|
497 | 497 | To ensure that metadata can be safely interchanged using the :ref:`sec_text_file_format`,
|
@@ -1046,7 +1046,7 @@ length. To encode such columns in the tables API, we store **two** columns:
|
1046 | 1046 | one contains the flattened array of data and another stores the **offsets**
|
1047 | 1047 | of each row into this flattened array. Consider an example::
|
1048 | 1048 |
|
1049 |
| - >>> s = msprime.SiteTable() |
| 1049 | + >>> s = tskit.SiteTable() |
1050 | 1050 | >>> s.add_row(0, "A")
|
1051 | 1051 | >>> s.add_row(0, "")
|
1052 | 1052 | >>> s.add_row(0, "TTT")
|
@@ -1231,7 +1231,7 @@ Legacy Versions
|
1231 | 1231 | ===============
|
1232 | 1232 |
|
1233 | 1233 | Tree sequence files written by older versions of tskit are not readable by
|
1234 |
| -newer versions of msprime. For major releases of tskit, ``tskit upgrade`` |
| 1234 | +newer versions of tskit. For major releases of tskit, ``tskit upgrade`` |
1235 | 1235 | will convert older tree sequence files to the latest version.
|
1236 | 1236 |
|
1237 | 1237 | File formats from version 11 onwards are based on
|
|
0 commit comments