Skip to content
This repository was archived by the owner on Apr 14, 2023. It is now read-only.

Commit

Permalink
Merge pull request #1103 from finos/docs-violation-and-nullness
Browse files Browse the repository at this point in the history
Added docs for the current violation and nullness behaviour.
  • Loading branch information
ms14981 authored Jul 4, 2019
2 parents fb11c23 + 80439c3 commit 7d4659e
Show file tree
Hide file tree
Showing 2 changed files with 103 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,7 @@ firstName,age,nationalInsurance
## Next steps

That's the end of our getting started guide. Hopefully it has given you a good understanding of what the DataHelix generator is capable of. If you'd like to find out more about the various constraints the tool supports, the [Profile Developer Guide](docs/ProfileDeveloperGuide.md) is a good next step. You might also be interested in the [examples folder](https://github.com/finos/datahelix/tree/master/examples), which illustrates various features of the generator.
For more detail about the behaviour of certain profiles, see the [behaviour in detail.](./docs/BehaviourInDetail.md)

## Contributing

Expand Down
102 changes: 102 additions & 0 deletions docs/BehaviourInDetail.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Behaviour in Detail
## Nullness
### Behaviour
Nulls can always be produced for a field, except when a field is explicitly not null.

### Misleading Examples
|Field is |Null produced|
|:----------------------|:-----------:|
|Of type X ||
|Not of type X ||
|In set [X, Y, ...] ||
|Not in set [X, Y, ...] ||
|Equal to X ||
|Not equal to X ||
|Greater than X ||
|Null ||
|Not null ||

For the profile snippet:
```
{ "if":
{ "field": "A", "is": "equalTo", "value": 1 },
"then":
{ "field": "B", "is": "equalTo", "value": 2 }
},
{ "field": "A", "is": "equalTo", "value": 1 }
```

|Allowed value of A|Allowed value of B|
|------------------|------------------|
|Null |Null |
|Null |2 |
|1 |Null |
|1 |2 |

## Type Implication
### Behaviour
No operators imply type (except ofType ones). By default, all values are allowed.

### Misleading Examples
Field is greater than number X:

|Values |Can be produced|
|----------------------|:-------------:|
|Numbers greater than X||
|Numbers less than X ||
|Null ||
|Strings ||
|Date-times ||

## Violation of Rules
### Behaviour
Rules, constraints and top level `allOf`s are all equivalent in normal generation.

In violation mode, rules are treated as blocks of violation.
For each rule, a file is generated containing data that can be generated by combining the
violation of that rule with the non-violated other rules.

This is equivalent to the behaviour for constraints and `allOf`s, but just splitting it
into different files.

## General Strategy for Violation
### Behaviour
The violation output is not guaranteed to be able to produce any data,
even when the negation of the entire profile could produce data.

### Why
The violation output could have been calculated by simply negating an entire rule or profile. This could then produce all data that breaks the original profile in any way. However, this includes data that breaks the data in multiple ways at once. This could be very noisy, because the user is expected to test one small breakage in a localised area at a time.

To minimise the noise in an efficient way, a guarantee of completeness is broken. The system does not guarantee to be able to produce violating data in all cases where there could be data which meets this requirement. In some cases this means that no data is produced at all.

In normal negation, negating `allOf [A, B, C]` gives any of the following:
1) `allOf[NOT(A), B, C]`
2) `allOf[A, NOT(B), C]`
3) `allOf[A, B, NOT(C)]`
4) `allOf[NOT(A), NOT(B), C]`
5) `allOf[A, NOT(B), NOT(C)]`
6) `allOf[NOT(A), B, NOT(C)]`
7) `allOf[NOT(A), NOT(B), NOT(C)]`

These are listed from the least to most noisy. The current system only tries to generate data by negating one sub-constraint at a time (in this case, producing only 1, 2 and 3).

### Misleading examples
When a field is a string, an integer and not null, no data can be produced normally,
but data can be produced in violation mode.

|Values |Can be produced when in violation mode|
|----------------------|:-------------------------------------|
|Null |✔ (By violating the "not null" constraint) |
|Numbers |✔ (By violating the "string" constraint) |
|Strings |✔ (By violating the "integer" constraint) |
|Date-times |❌ (This would need both the "string" and "integer" constraints to be violated at the same time) |

If a field is set to null twice, no data can be produced in violation mode because it tries to evaluate null and not null:

|Values |Can be produced when in violation mode|
|----------------------|:-------------------------------------|
|Null ||
|Numbers ||
|Strings ||
|Date-times ||

0 comments on commit 7d4659e

Please sign in to comment.