Quantco
diff --git a/‎docs/development.md
+3-3 b/‎docs/development.md
+3-3
diff --git a/‎docs/examples/company-data.md
+13-13 b/‎docs/examples/company-data.md
+13-13
diff --git a/‎docs/examples/dates.md
+25-26 b/‎docs/examples/dates.md
+25-26
diff --git a/‎docs/examples/exploration.md
+34-35 b/‎docs/examples/exploration.md
+34-35
diff --git a/‎docs/examples/twitch.md
+15-15 b/‎docs/examples/twitch.md
+15-15
@@ -1,7 +1,7 @@
 # Development
 
-``datajudge`` development relies on [pixi](https://pixi.sh/latest/).
-In order to work on ``datajudge``, you can create a development environment as follows:
+`datajudge` development relies on [pixi](https://pixi.sh/latest/).
+In order to work on `datajudge`, you can create a development environment as follows:
 
 ```bash
 git clone https://github.com/Quantco/datajudge
@@ -24,7 +24,7 @@ To run integration tests against Postgres, first start a docker container with a
 ./start_postgres.sh
 ```
 
-In your current environment, install the ``psycopg2`` package.
+In your current environment, install the `psycopg2` package.
 After this, you may execute integration tests as follows:
 
 ```bash
 
@@ -6,22 +6,22 @@ The table "companies_archive" contains three entries:
 
 **companies_archive**
 
-| id | name    | num_employees |
-|----|---------|---------------|
-| 1  | QuantCo | 90            |
-| 2  | Google  | 140,000       |
-| 3  | BMW     | 110,000       |
+| id  | name    | num_employees |
+| --- | ------- | ------------- |
+| 1   | QuantCo | 90            |
+| 2   | Google  | 140,000       |
+| 3   | BMW     | 110,000       |
 
 While "companies" contains an additional entry:
 
 **companies**
 
-| id | name    | num_employees |
-|----|---------|---------------|
-| 1  | QuantCo | 100           |
-| 2  | Google  | 150,000       |
-| 3  | BMW     | 120,000       |
-| 4  | Apple   | 145,000       |
+| id  | name    | num_employees |
+| --- | ------- | ------------- |
+| 1   | QuantCo | 100           |
+| 2   | Google  | 150,000       |
+| 3   | BMW     | 120,000       |
+| 4   | Apple   | 145,000       |
 
 ```python
 import sqlalchemy as sa
@@ -108,7 +108,7 @@ requirements = [companies_req, companies_between_req]
 test_constraint = collect_data_tests(requirements)
 ```
 
-Saving this file as ``specification.py`` and running ``$ pytest specification.py``
+Saving this file as `specification.py` and running `$ pytest specification.py`
 will verify that all constraints are satisfied. The output you see in the terminal
 should be similar to this:
 
@@ -125,4 +125,4 @@ specification.py::test_constraint[RowSuperset::companies|companies_archive] PASS
 ==================================== 4 passed in 0.31s ====================================
 ```
 
-You can also use a formatted html report using the ``--html=report.html`` flag.
+You can also use a formatted html report using the `--html=report.html` flag.
@@ -1,40 +1,39 @@
 # Dates
 
-This example concerns itself with expressing ``Constraint``\s against data revolving
-around dates. While date ``Constraint``\s between tables exist, we will only illustrate
-``Constraint``\s on a single table and reference values here. As a consequence, we will
-only use ``WithinRequirement``, as opposed to ``BetweenRequirement``.
+This example concerns itself with expressing `Constraint`\s against data revolving
+around dates. While date `Constraint`\s between tables exist, we will only illustrate
+`Constraint`\s on a single table and reference values here. As a consequence, we will
+only use `WithinRequirement`, as opposed to `BetweenRequirement`.
 
 Concretely, we will assume a table containing prices for a given product of id 1.
 Importantly, these prices are valid for a certain date range only. More precisely,
-we assume that the price for a product - identified via the ``preduct_id`` column -
-is indicated in the ``price`` column, the date from which it is valid - the date
-itself included - in ``date_from`` and the the until when it is valid - the date
-itself included - in the ``date_to`` column.
+we assume that the price for a product - identified via the `preduct_id` column -
+is indicated in the `price` column, the date from which it is valid - the date
+itself included - in `date_from` and the the until when it is valid - the date
+itself included - in the `date_to` column.
 
 Such a table might look as follows:
 
 **prices**
 
-| product_id | price | date_from | date_to |
-|------------|-------|-----------|---------|
-| 1          | 13.99 | 22/01/01  | 22/01/10|
-| 1          | 14.5  | 22/01/11  | 22/01/17|
-| 1          | 13.37 | 22/01/16  | 22/01/31|
+| product_id | price | date_from | date_to  |
+| ---------- | ----- | --------- | -------- |
+| 1          | 13.99 | 22/01/01  | 22/01/10 |
+| 1          | 14.5  | 22/01/11  | 22/01/17 |
+| 1          | 13.37 | 22/01/16  | 22/01/31 |
 
 Given this table, we would like to ensure - for the sake of illustrational purposes -
 that 6 constraints are satisfied:
 
-1. All values from column ``date_from`` should be in January 2022.
-2. All values from column ``date_to`` should be in January 2022.
-3. The minimum value in column ``date_from`` should be the first of January 2022.
-4. The maximum value in column ``date_to`` should be the 31st of January 2022.
-5. There is no gap between ``date_from`` and ``date_to``. In other words, every date
+1. All values from column `date_from` should be in January 2022.
+2. All values from column `date_to` should be in January 2022.
+3. The minimum value in column `date_from` should be the first of January 2022.
+4. The maximum value in column `date_to` should be the 31st of January 2022.
+5. There is no gap between `date_from` and `date_to`. In other words, every date
    of January has to be assigned to at least one row for a given product.
-6. There is no overlap between ``date_from`` and ``date_to``. In other words, every
+6. There is no overlap between `date_from` and `date_to`. In other words, every
    date of January has to be assigned to at most one row for a given product.
 
-
 Assuming that such a table exists in database, we can write a specification against it.
 
 ```python
@@ -140,17 +139,17 @@ requirements = [prices_req]
 test_constraint = collect_data_tests(requirements)
 ```
 
-Please note that the ``DateNoOverlap`` and ``DateNoGap`` constraints also exist
-in a slightly different form: ``DateNoOverlap2d`` and ``DateNoGap2d``.
+Please note that the `DateNoOverlap` and `DateNoGap` constraints also exist
+in a slightly different form: `DateNoOverlap2d` and `DateNoGap2d`.
 As the names suggest, these can operate in 'two date dimensions'.
 
 For example, let's assume a table with four date columns, representing two
 ranges in distinct dimensions, respectively:
 
-* ``date_from``: Date from when a price is valid
-* ``date_to``: Date until when a price is valid
-* ``date_definition_from``: Date when a price definition was inserted
-* ``date_definition_to``: Date until when a price definition was used
+- `date_from`: Date from when a price is valid
+- `date_to`: Date until when a price is valid
+- `date_definition_from`: Date when a price definition was inserted
+- `date_definition_to`: Date until when a price definition was used
 
 Analogously to the unidimensional scenario illustrated here, one might care
 for certain constraints in two dimensions.
@@ -19,28 +19,28 @@ usually doesn't.
 In the following we will attempt to illustrate possible usages of datajudge for
 exploration by looking at three simple examples.
 
-These examples rely on some insight about how most datajudge ``Constraint`` s work under
-the hood. Importantly, ``Constraint`` s typically come with
+These examples rely on some insight about how most datajudge `Constraint` s work under
+the hood. Importantly, `Constraint` s typically come with
 
-* a ``retrieve`` method: this method fetches relevant data from database, given a
-  ``DataReference``
-* a ``get_factual_value`` method: this is typically a wrapper around ``retrieve`` for the
-  first ``DataReference`` of the given ``Requirement`` / ``Constraint``
-* a ``get_target_value`` method: this is either a wrapper around ``retrieve`` for the
-  second ``DataReference`` in the case of a ``BetweenRequirement`` or an echoing of the
-  ``Constraint`` s key reference value in the case of a ``WithinRequirement``
+- a `retrieve` method: this method fetches relevant data from database, given a
+  `DataReference`
+- a `get_factual_value` method: this is typically a wrapper around `retrieve` for the
+  first `DataReference` of the given `Requirement` / `Constraint`
+- a `get_target_value` method: this is either a wrapper around `retrieve` for the
+  second `DataReference` in the case of a `BetweenRequirement` or an echoing of the
+  `Constraint` s key reference value in the case of a `WithinRequirement`
 
 Moreover, as is the case when using datajudge for testing purposes, these approaches rely
 on a [sqlalchemy engine](ttps://docs.sqlalchemy.org/en/14/core/connections.html). The
 latter is the gateway to the database at hand.
 
 ## Example 1: Comparing numbers of rows
 
-Assume we have two tables in the same database called ``table1`` and ``table2``. Now we
+Assume we have two tables in the same database called `table1` and `table2`. Now we
 would like to compare their numbers of rows. Naturally, we would like to retrieve
 the respective numbers of rows before we can compare them. For this purpose we create
-a ``BetweenTableRequirement`` referring to both tables and add a ``NRowsEquality``
-``Constraint`` onto it.
+a `BetweenTableRequirement` referring to both tables and add a `NRowsEquality`
+`Constraint` onto it.
 
 ```python
 import sqlalchemy as sa
@@ -60,36 +60,36 @@ n_rows1 = req[0].get_factual_value(engine)
 n_rows2 = req[0].get_target_value(engine)
 ```
 
-Note that here, we access the first (and only) ``Constraint`` that has been added to the
-``BetweenRequirement`` by writing ``req[0]``. ``Requirements`` are are sequences of
-``Constraint`` s, after all.
+Note that here, we access the first (and only) `Constraint` that has been added to the
+`BetweenRequirement` by writing `req[0]`. `Requirements` are are sequences of
+`Constraint` s, after all.
 
 Once the numbers of rows are retrieved, we can compare them as we wish. For instance, we
 could compute the absolute and relative growth (or loss) of numbers of rows from
-``table1`` to ``table2``:
+`table1` to `table2`:
 
 ```python
 absolute_change = abs(n_rows2 - n_rows1)
 relative_change = (absolute_change) / n_rows1 if n_rows1 != 0 else None
 ```
 
-Importantly, many datajudge staples, such as ``Condition`` s can be used, too. We shall see
+Importantly, many datajudge staples, such as `Condition` s can be used, too. We shall see
 this in our next example.
 
 ## Example 2: Investigating unique values
 
-In this example we will suppose that there is a table called ``table`` consisting of
-several columns. Two of its columns are supposed to be called ``col_int`` and
-``col_varchar``. We are now interested in the unique values in these two columns combined.
+In this example we will suppose that there is a table called `table` consisting of
+several columns. Two of its columns are supposed to be called `col_int` and
+`col_varchar`. We are now interested in the unique values in these two columns combined.
 Put differently, we are wondering:
 
-> Which unique pairs of values in ``col_int`` and ``col_varchar`` have we encountered?
+> Which unique pairs of values in `col_int` and `col_varchar` have we encountered?
 
-To add to the mix, we will moreover only be interested in tuples in which ``col_int`` has a
+To add to the mix, we will moreover only be interested in tuples in which `col_int` has a
 value of larger than 10.
 
-As before, we will start off by creating a ``Requirement``. Since we are only dealing with
-a single table this time, we will create a ``WithinRequirement``.
+As before, we will start off by creating a `Requirement`. Since we are only dealing with
+a single table this time, we will create a `WithinRequirement`.
 
 ```python
 import sqlalchemy as sa
@@ -113,20 +113,20 @@ req.add_uniques_equality_constraint(
 uniques = req[0].get_factual_value(engine)
 ```
 
-If one was to investigate this ``uniques`` variable further, one could, e.g. see the
+If one was to investigate this `uniques` variable further, one could, e.g. see the
 following:
 
 ```python
 ([(10, 'hi10'), (11, 'hi11'), (12, 'hi12'), (13, 'hi13'), (14, 'hi14'), (15, 'hi15'), (16, 'hi16'), (17, 'hi17'), (18, 'hi18'), (19, 'hi19')], [1, 100, 12, 1, 7, 8, 1, 1, 1337, 1])
 ```
 
-This becomes easier to parse when inspecting the underlying ``retrieve`` method of the
-``UniquesEquality`` ``Constraint``: the first value of the tuple corresponds to the list
-of unique pairs in columns ``col_int`` and ``col_varchar``. The second value of the tuple
+This becomes easier to parse when inspecting the underlying `retrieve` method of the
+`UniquesEquality` `Constraint`: the first value of the tuple corresponds to the list
+of unique pairs in columns `col_int` and `col_varchar`. The second value of the tuple
 are the respective counts thereof.
 
 Moreoever, one could manually customize the underlying SQL query. In order to do so, one
-can use the fact that ``retrieve`` methods typically return an actual result or value
+can use the fact that `retrieve` methods typically return an actual result or value
 as well as the sqlalchemy selections that led to said result or value. We can use these
 selections and compile them to a standard, textual SQL query:
 
@@ -161,13 +161,13 @@ table. Moreover, for columns present in both tables, we'd like to learn about th
 respective types.
 
 In order to illustrate such an example, we will again assume that there are two tables
-called ``table1`` and ``table2``, irrespective of prior examples.
+called `table1` and `table2`, irrespective of prior examples.
 
-We can now create a ``BetweenRequirement`` for these two tables and use the
-``ColumnSubset`` ``Constraint``. As before, we will rely on the ``get_factual_value``
+We can now create a `BetweenRequirement` for these two tables and use the
+`ColumnSubset` `Constraint`. As before, we will rely on the `get_factual_value`
 method to retrieve the values of interest for the first table passed to the
-``BetweenRequirement`` and the ``get_target_value`` method for the second table passed
-to the ``BetweenRequirement``.
+`BetweenRequirement` and the `get_target_value` method for the second table passed
+to the `BetweenRequirement`.
 
 ```python
 import sqlalchemy as sa
@@ -194,7 +194,6 @@ print(f"Columns present in only table1: {set(columns1) - set(columns2)}")
 print(f"Columns present in only table2: {set(columns2) - set(columns1)}")
 ```
 
-
 This could, for instance result in the following printout:
 
 ```
 
@@ -41,18 +41,18 @@ df_v2.to_sql("twitch_v2", engine, schema="public", if_exists="replace")
 df_v1.to_sql("twitch_v1", engine, schema="public", if_exists="replace")
 ```
 
-Once the tables are stored in a database, we can actually write a ``datajudge``
+Once the tables are stored in a database, we can actually write a `datajudge`
 specification against them. But first, we'll have a look at what the data roughly
 looks like by investigating a random sample of four rows:
 
 **A sample of the data**
 
-| channel  | watch_time | stream_time | peak_viewers | average_viewers | followers | followers_gained | views_gained | partnered | mature | language  |
-|----------|------------|-------------|--------------|-----------------|-----------|------------------|--------------|-----------|--------|-----------|
-| xQcOW    | 6196161750 | 215250      | 222720       | 27716           | 3246298   | 1734810          | 93036735     | True      | False  | English   |
-| summit1g | 6091677300 | 211845      | 310998       | 25610           | 5310163   | 1374810          | 89705964     | True      | False  | English   |
-| Gaules   | 5644590915 | 515280      | 387315       | 10976           | 1767635   | 1023779          | 102611607    | True      | True   | Portuguese|
-| ESL_CSGO | 3970318140 | 517740      | 300575       | 7714            | 3944850   | 703986           | 106546942    | True      | False  | English   |
+| channel  | watch_time | stream_time | peak_viewers | average_viewers | followers | followers_gained | views_gained | partnered | mature | language   |
+| -------- | ---------- | ----------- | ------------ | --------------- | --------- | ---------------- | ------------ | --------- | ------ | ---------- |
+| xQcOW    | 6196161750 | 215250      | 222720       | 27716           | 3246298   | 1734810          | 93036735     | True      | False  | English    |
+| summit1g | 6091677300 | 211845      | 310998       | 25610           | 5310163   | 1374810          | 89705964     | True      | False  | English    |
+| Gaules   | 5644590915 | 515280      | 387315       | 10976           | 1767635   | 1023779          | 102611607    | True      | True   | Portuguese |
+| ESL_CSGO | 3970318140 | 517740      | 300575       | 7714            | 3944850   | 703986           | 106546942    | True      | False  | English    |
 
 Note that we expect both version 1 and version 2 to follow this structure. Due to them
 being assembled at different points in time, merely their rows shows differ.
@@ -80,7 +80,7 @@ express expectations against them. In this example, we have two tables in the sa
 one table per version of the Twitch data.
 
 Yet, let's start with a straightforward example only using version 2. We want to use our
-domain knowledge that constrains the values of the ``language`` column only to contain letters
+domain knowledge that constrains the values of the `language` column only to contain letters
 and have a length strictly larger than 0.
 
 ```python
@@ -145,7 +145,7 @@ between_requirement_version.add_uniques_equality_constraint(
 Now having compared the 'same kind of data' between version 1 and version 2,
 we may as well compare 'different kind of data' within version 2, as a means of
 a sanity check. This sanity check consists of checking whether the mean
-``average_viewer`` value of mature channels should deviate at most 10% from
+`average_viewer` value of mature channels should deviate at most 10% from
 the overall mean.
 
 ```python
@@ -168,7 +168,7 @@ between_requirement_columns.add_numeric_mean_constraint(
 ```
 
 Lastly, we need to collect all of our requirements in a list and make sure
-``pytest`` can find them by calling ``collect_data_tests``.
+`pytest` can find them by calling `collect_data_tests`.
 
 ```python
 from datajudge.pytest_integration import collect_data_tests
@@ -268,9 +268,9 @@ to investigate what is wrong with the data, what this has been caused by and how
 
 Concretely, what exactly do we learn from the error messages?
 
-* The column ``language`` now has a row with value ``'Sw3d1zh'``. This break two of our
-  constraints. The ``VarCharRegex`` constraint compared the columns' values to a regular
-  expression. The ``UniquesEquality`` constraint expected the unique values of the
-  ``language`` column to not have changed between version 1 and version 2.
-* The mean value of ``average_viewers`` of ``mature`` channels is substantially - more
+- The column `language` now has a row with value `'Sw3d1zh'`. This break two of our
+  constraints. The `VarCharRegex` constraint compared the columns' values to a regular
+  expression. The `UniquesEquality` constraint expected the unique values of the
+  `language` column to not have changed between version 1 and version 2.
+- The mean value of `average_viewers` of `mature` channels is substantially - more
   than our 10% tolerance - lower than the global mean.