@@ -19,28 +19,28 @@ usually doesn't.
19
19
In the following we will attempt to illustrate possible usages of datajudge for
20
20
exploration by looking at three simple examples.
21
21
22
- These examples rely on some insight about how most datajudge `` Constraint ` ` s work under
23
- the hood. Importantly, `` Constraint ` ` s typically come with
22
+ These examples rely on some insight about how most datajudge ` Constraint ` s work under
23
+ the hood. Importantly, ` Constraint ` s typically come with
24
24
25
- * a `` retrieve ` ` method: this method fetches relevant data from database, given a
26
- `` DataReference ` `
27
- * a `` get_factual_value `` method: this is typically a wrapper around `` retrieve ` ` for the
28
- first `` DataReference `` of the given `` Requirement `` / `` Constraint ` `
29
- * a `` get_target_value `` method: this is either a wrapper around `` retrieve ` ` for the
30
- second `` DataReference `` in the case of a `` BetweenRequirement ` ` or an echoing of the
31
- `` Constraint `` s key reference value in the case of a `` WithinRequirement ` `
25
+ - a ` retrieve ` method: this method fetches relevant data from database, given a
26
+ ` DataReference `
27
+ - a ` get_factual_value ` method: this is typically a wrapper around ` retrieve ` for the
28
+ first ` DataReference ` of the given ` Requirement ` / ` Constraint `
29
+ - a ` get_target_value ` method: this is either a wrapper around ` retrieve ` for the
30
+ second ` DataReference ` in the case of a ` BetweenRequirement ` or an echoing of the
31
+ ` Constraint ` s key reference value in the case of a ` WithinRequirement `
32
32
33
33
Moreover, as is the case when using datajudge for testing purposes, these approaches rely
34
34
on a [ sqlalchemy engine] ( ttps://docs.sqlalchemy.org/en/14/core/connections.html ) . The
35
35
latter is the gateway to the database at hand.
36
36
37
37
## Example 1: Comparing numbers of rows
38
38
39
- Assume we have two tables in the same database called `` table1 `` and `` table2 ` ` . Now we
39
+ Assume we have two tables in the same database called ` table1 ` and ` table2 ` . Now we
40
40
would like to compare their numbers of rows. Naturally, we would like to retrieve
41
41
the respective numbers of rows before we can compare them. For this purpose we create
42
- a `` BetweenTableRequirement `` referring to both tables and add a `` NRowsEquality ` `
43
- `` Constraint ` ` onto it.
42
+ a ` BetweenTableRequirement ` referring to both tables and add a ` NRowsEquality `
43
+ ` Constraint ` onto it.
44
44
45
45
``` python
46
46
import sqlalchemy as sa
@@ -60,36 +60,36 @@ n_rows1 = req[0].get_factual_value(engine)
60
60
n_rows2 = req[0 ].get_target_value(engine)
61
61
```
62
62
63
- Note that here, we access the first (and only) `` Constraint ` ` that has been added to the
64
- `` BetweenRequirement `` by writing `` req[0] `` . `` Requirements ` ` are are sequences of
65
- `` Constraint ` ` s, after all.
63
+ Note that here, we access the first (and only) ` Constraint ` that has been added to the
64
+ ` BetweenRequirement ` by writing ` req[0] ` . ` Requirements ` are are sequences of
65
+ ` Constraint ` s, after all.
66
66
67
67
Once the numbers of rows are retrieved, we can compare them as we wish. For instance, we
68
68
could compute the absolute and relative growth (or loss) of numbers of rows from
69
- `` table1 `` to `` table2 ` ` :
69
+ ` table1 ` to ` table2 ` :
70
70
71
71
``` python
72
72
absolute_change = abs (n_rows2 - n_rows1)
73
73
relative_change = (absolute_change) / n_rows1 if n_rows1 != 0 else None
74
74
```
75
75
76
- Importantly, many datajudge staples, such as `` Condition ` ` s can be used, too. We shall see
76
+ Importantly, many datajudge staples, such as ` Condition ` s can be used, too. We shall see
77
77
this in our next example.
78
78
79
79
## Example 2: Investigating unique values
80
80
81
- In this example we will suppose that there is a table called `` table ` ` consisting of
82
- several columns. Two of its columns are supposed to be called `` col_int ` ` and
83
- `` col_varchar ` ` . We are now interested in the unique values in these two columns combined.
81
+ In this example we will suppose that there is a table called ` table ` consisting of
82
+ several columns. Two of its columns are supposed to be called ` col_int ` and
83
+ ` col_varchar ` . We are now interested in the unique values in these two columns combined.
84
84
Put differently, we are wondering:
85
85
86
- > Which unique pairs of values in `` col_int `` and `` col_varchar ` ` have we encountered?
86
+ > Which unique pairs of values in ` col_int ` and ` col_varchar ` have we encountered?
87
87
88
- To add to the mix, we will moreover only be interested in tuples in which `` col_int ` ` has a
88
+ To add to the mix, we will moreover only be interested in tuples in which ` col_int ` has a
89
89
value of larger than 10.
90
90
91
- As before, we will start off by creating a `` Requirement ` ` . Since we are only dealing with
92
- a single table this time, we will create a `` WithinRequirement ` ` .
91
+ As before, we will start off by creating a ` Requirement ` . Since we are only dealing with
92
+ a single table this time, we will create a ` WithinRequirement ` .
93
93
94
94
``` python
95
95
import sqlalchemy as sa
@@ -113,20 +113,20 @@ req.add_uniques_equality_constraint(
113
113
uniques = req[0 ].get_factual_value(engine)
114
114
```
115
115
116
- If one was to investigate this `` uniques ` ` variable further, one could, e.g. see the
116
+ If one was to investigate this ` uniques ` variable further, one could, e.g. see the
117
117
following:
118
118
119
119
``` python
120
120
([(10 , ' hi10' ), (11 , ' hi11' ), (12 , ' hi12' ), (13 , ' hi13' ), (14 , ' hi14' ), (15 , ' hi15' ), (16 , ' hi16' ), (17 , ' hi17' ), (18 , ' hi18' ), (19 , ' hi19' )], [1 , 100 , 12 , 1 , 7 , 8 , 1 , 1 , 1337 , 1 ])
121
121
```
122
122
123
- This becomes easier to parse when inspecting the underlying `` retrieve ` ` method of the
124
- `` UniquesEquality `` `` Constraint ` ` : the first value of the tuple corresponds to the list
125
- of unique pairs in columns `` col_int `` and `` col_varchar ` ` . The second value of the tuple
123
+ This becomes easier to parse when inspecting the underlying ` retrieve ` method of the
124
+ ` UniquesEquality ` ` Constraint ` : the first value of the tuple corresponds to the list
125
+ of unique pairs in columns ` col_int ` and ` col_varchar ` . The second value of the tuple
126
126
are the respective counts thereof.
127
127
128
128
Moreoever, one could manually customize the underlying SQL query. In order to do so, one
129
- can use the fact that `` retrieve ` ` methods typically return an actual result or value
129
+ can use the fact that ` retrieve ` methods typically return an actual result or value
130
130
as well as the sqlalchemy selections that led to said result or value. We can use these
131
131
selections and compile them to a standard, textual SQL query:
132
132
@@ -161,13 +161,13 @@ table. Moreover, for columns present in both tables, we'd like to learn about th
161
161
respective types.
162
162
163
163
In order to illustrate such an example, we will again assume that there are two tables
164
- called `` table1`` and `` table2` ` , irrespective of prior examples.
164
+ called `table1` and `table2` , irrespective of prior examples.
165
165
166
- We can now create a `` BetweenRequirement` ` for these two tables and use the
167
- `` ColumnSubset`` `` Constraint`` . As before, we will rely on the `` get_factual_value` `
166
+ We can now create a `BetweenRequirement` for these two tables and use the
167
+ `ColumnSubset` ` Constraint` . As before, we will rely on the `get_factual_value`
168
168
method to retrieve the values of interest for the first table passed to the
169
- `` BetweenRequirement`` and the `` get_target_value` ` method for the second table passed
170
- to the `` BetweenRequirement` ` .
169
+ `BetweenRequirement` and the `get_target_value` method for the second table passed
170
+ to the `BetweenRequirement` .
171
171
172
172
```python
173
173
import sqlalchemy as sa
@@ -194,7 +194,6 @@ print(f"Columns present in only table1: {set(columns1) - set(columns2)}")
194
194
print (f " Columns present in only table2: { set (columns2) - set (columns1)} " )
195
195
```
196
196
197
-
198
197
This could, for instance result in the following printout:
199
198
200
199
```
0 commit comments