Good-enough first pass at the docs.

jeromekelleher · jeromekelleher · commit ea3e94c622ff · 2019-01-11T16:08:49.000Z
diff --git a/docs/data-model.rst b/docs/data-model.rst
@@ -1,29 +1,29 @@
-.. _sec_interchange:
+.. _sec_data_model:
 
-#########################
-Tree sequence interchange
-#########################
+##########
+Data model
+##########
 
 The correlated genealogical trees that describe the shared ancestry of a set of
-samples are stored concisely in ``msprime`` as a collection of
+samples are stored concisely in ``tskit`` as a collection of
 easy-to-understand tables. These are output by coalescent simulation in
 ``msprime`` or can be read in from another source. This page documents
 the structure of the tables, and the different methods of interchanging
-genealogical data to and from the msprime API. We begin by defining
+genealogical data to and from the tskit API. We begin by defining
 the basic concepts that we need and the structure of the tables in the
 `Data model`_ section. We then describe the tabular text formats that can
 be used as simple interchange mechanism for small amounts of data in the
 `Text file formats`_ section. The `Binary interchange`_ section then describes
 the efficient Python API for table interchange using numpy arrays. Finally,
-we describe the binary format used by msprime to efficiently
+we describe the binary format used by tskit to efficiently
 store tree sequences on disk in the `Tree sequence file format`_ section.
 
 
-.. _sec_data_model:
+.. _sec_data_model_definitions:
 
-**********
-Data model
-**********
+***********
+Definitions
+***********
 
 To begin, here are definitions of some key ideas encountered later.
 
@@ -156,7 +156,7 @@ term "genome" at times, for concreteness.
 Several properties naturally associated with individuals are in fact assigned
 to nodes in what follows: birth time and population. This is for two reasons:
 First, since coalescent simulations naturally lack a notion of polyploidy, earlier
-versions of ``msprime`` lacked the notion of an individual. Second, ancestral
+versions of ``tskit`` lacked the notion of an individual. Second, ancestral
 nodes are not naturally grouped together into individuals -- we know they must have
 existed, but have no way of inferring this grouping, so in fact many nodes in
 an empirically-derived tree sequence will not be associated with individuals,
@@ -405,7 +405,7 @@ helpful for inferring demographic history to record this history.
 Migrations are performed by individual ancestors, but most likely not by an
 individual whose genome is tracked as a ``node`` (as in a discrete-deme model they are
 unlikely to be both a migrant and a most recent common ancestor).  So,
-``msprime`` records when a segment of ancestry has moved between
+``tskit`` records when a segment of ancestry has moved between
 populations. This table is not required, even if different nodes come from
 different populations.
 
@@ -491,7 +491,7 @@ the library itself can use. All other information is considered to be
 tables.
 
 Arbitrary binary data can be stored in ``metadata`` columns, and the
-``msprime`` library makes no attempt to interpret this information. How the
+``tskit`` library makes no attempt to interpret this information. How the
 information held in this field is encoded is entirely the choice of client code.
 
 To ensure that metadata can be safely interchanged using the :ref:`sec_text_file_format`,
@@ -1046,7 +1046,7 @@ length. To encode such columns in the tables API, we store **two** columns:
 one contains the flattened array of data and another stores the **offsets**
 of each row into this flattened array. Consider an example::
 
-    >>> s = msprime.SiteTable()
+    >>> s = tskit.SiteTable()
     >>> s.add_row(0, "A")
     >>> s.add_row(0, "")
     >>> s.add_row(0, "TTT")
@@ -1231,7 +1231,7 @@ Legacy Versions
 ===============
 
 Tree sequence files written by older versions of tskit are not readable by
-newer versions of msprime. For major releases of tskit, ``tskit upgrade``
+newer versions of tskit. For major releases of tskit, ``tskit upgrade``
 will convert older tree sequence files to the latest version.
 
 File formats from version 11 onwards are based on
diff --git a/docs/development.rst b/docs/development.rst
@@ -1,7 +1,57 @@
 .. _sec_development:
 
-=======================
-Developer documentation
-=======================
+===========
+Development
+===========
 
-.. todo:: Port developer docs
+If you would like to add some features to ``tskit``, please read the
+following. If you think there is anything missing,
+please open an `issue <http://github.com/tskit-dev/tskit/issues>`_ or
+`pull request <http://github.com/tskit-dev/tskit/pulls>`_ on GitHub!
+
+**********
+Quickstart
+**********
+
+- Make a fork of the tskit repo on `GitHub <github.com/tskit-dev/tskit>`_
+- Clone your fork into a local directory, making sure that the **submodules
+  are correctly initialised**::
+
+  $ git clone git@github.com:tskit-dev/tskit.git --recurse-submodules
+
+  For an already checked out repo, the submodules can be initialised using::
+
+  $ git submodule update --init --recursive
+
+- Install the Python development requirements using
+  ``pip install -r python/requirements/development.txt``.
+- Build the low level module by running ``make`` in the ``python`` directory. Python 3.x
+  is the default for developement (Python 2.x is discouraged).
+- Run the tests to ensure everything has worked: ``python -m nose -vs``. These should
+  all pass.
+- Make your changes in a local branch, and open a pull request on GitHub when you
+  are ready. Please make sure that (a) the tests pass before you open the PR; and
+  (b) your code passes PEP8 checks (see below for a git commit hook to ensure this
+  happens automatically) before opening the PR.
+
+****************************
+Continuous integration tests
+****************************
+
+Three different continuous integration providers are used, which run different
+combinations of tests on different platforms:
+
+1. `Travis CI <https://travis-ci.org/>`_ runs tests on Linux and OSX using the
+   `Conda <https://conda.io/docs/>`__ infrastructure for the system level
+   requirements. All supported versions of Python are tested here.
+
+2. `CircleCI <https://circleci.com/>`_ Runs all Python tests using the apt-get
+   infrastructure for system requirements. Additionally, the low-level tests
+   are run, coverage statistics calculated using `CodeCov <https://codecov.io/gh>`__,
+   and the documentation built.
+
+3. `AppVeyor <https://www.appveyor.com/>`_ Runs Python tests on 32 and 64 bit
+   Windows using conda.
+
+
+.. todo:: Complete porting the documentation from msprime
diff --git a/docs/index.rst b/docs/index.rst
@@ -14,11 +14,10 @@ Welcome to tskit's documentation!
    installation
    python-api
    c-api
+   data-model
+   provenance
    development
    changelogs
-   interchange
-   provenance
-   tutorial
 
 
 Indices and tables
diff --git a/docs/installation.rst b/docs/installation.rst
@@ -4,4 +4,13 @@
 Installation
 ============
 
-.. todo:: Port installation docs.
+.. note:: This documentation is incomplete. Once we have a conda package that
+    is installable we'll update the documentation to use that route also. However,
+    as there are no external dependencies, pip should work well for all
+    non-Windows users.
+
+Please install ``tskit`` from PyPI using pip::
+
+    $ python -m pip install tskit
+
+
diff --git a/docs/introduction.rst b/docs/introduction.rst
@@ -4,6 +4,8 @@
 Introduction
 ============
 
-.. note:: This documentation is incomplete and still under developement. If
-    you would like to help, please open an issue on
+This is the documentation for tskit, the tree sequence toolkit.
+
+.. note:: This documentation is incomplete and under development. If
+    you would like to help, please open an issue or pull request at
     `GitHub <https://github.com/tskit-dev/tskit>`_.
diff --git a/docs/python-api.rst b/docs/python-api.rst
@@ -4,6 +4,12 @@
 Python API
 ==========
 
+This page provides detailed documentation for the ``tskit`` Python API.
+
+************************
+Trees and tree sequences
+************************
+
 The :class:`.TreeSequence` class represents a sequence of correlated trees
 output by a simulation. The :class:`.Tree` class represents a single
 tree in this sequence.
@@ -13,6 +19,7 @@ There are also methods for loading data into these objects, either from the nati
 format using :func:`tskit.load`, or from another sources
 using :func:`tskit.load_text` or :meth:`.TableCollection.tree_sequence`.
 
+
 +++++++++++++++++
 Top level-classes
 +++++++++++++++++
@@ -49,7 +56,7 @@ Simple container classes
 ++++++++++++++++++++++++
 
 These classes are simple shallow containers representing the entities defined
-in the :ref:`sec_data_model`. These classes are not intended to be instantiated
+in the :ref:`sec_data_model_definitions`. These classes are not intended to be instantiated
 directly, but are the return types for the various iterators provided by the
 :class:`.TreeSequence` and :class:`.Tree` classes.
 
@@ -96,22 +103,11 @@ using the :ref:`Tables API <sec_tables_api>`.
 .. autofunction:: tskit.load_text
 
 
-**********************
-Calculating statistics
-**********************
-
-The ``tskit`` API provides methods for efficiently calculating
-population genetics statistics from a given :class:`.TreeSequence`.
-
-.. autoclass:: tskit.LdCalculator(tree_sequence)
-    :members:
-
-
 .. _sec_tables_api:
 
-***********
-Tables API
-***********
+******
+Tables
+******
 
 The :ref:`tables API <sec_binary_interchange>` provides an efficient way of working
 with and interchanging :ref:`tree sequence data <sec_data_model>`. Each table
@@ -442,12 +438,24 @@ Table functions
 
 .. autofunction:: tskit.unpack_bytes
 
+.. _sec_stats_api:
+
+**********
+Statistics
+**********
+
+The ``tskit`` API provides methods for efficiently calculating
+population genetics statistics from a given :class:`.TreeSequence`.
+
+.. autoclass:: tskit.LdCalculator(tree_sequence)
+    :members:
+
 
 .. _sec_provenance_api:
 
-**************
-Provenance API
-**************
+**********
+Provenance
+**********
 
 We provide some preliminary support for validating JSON documents against the
 :ref:`provenance schema <sec_provenance>`. Programmatic access to provenance
@@ -457,5 +465,3 @@ information is planned for future versions.
 
 .. autoexception:: tskit.ProvenanceValidationError
 
-
-
diff --git a/docs/tutorial.rst b/docs/tutorial.rst
diff --git a/python/tskit/stats.py b/python/tskit/stats.py