Skip to content

Commit 643ae92

Browse files
committed
doc: Document the validation model, context and inheritance principle
1 parent dd8a5b3 commit 643ae92

File tree

4 files changed

+263
-0
lines changed

4 files changed

+263
-0
lines changed

docs/index.md

+10
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ deno run -A jsr:@bids/validator
2626
```
2727

2828
```{toctree}
29+
:maxdepth: 2
2930
:hidden:
3031
:caption: User guide
3132
@@ -35,6 +36,7 @@ user_guide/issues.md
3536
```
3637

3738
```{toctree}
39+
:maxdepth: 2
3840
:hidden:
3941
:caption: Developer guide
4042
@@ -43,6 +45,14 @@ dev/contributing.md
4345
dev/environment.md
4446
```
4547

48+
```{toctree}
49+
:maxdepth: 2
50+
:hidden:
51+
:caption: Concepts
52+
53+
validation-model/index.md
54+
```
55+
4656
```{toctree}
4757
:hidden:
4858
:caption: Reference

docs/validation-model/context.md

+159
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
# The validation context
2+
3+
The core structure of the validator is the `context`,
4+
a namespace that aggregates properties of the dataset (the `dataset` variable, above)
5+
and the current file being validated.
6+
7+
Its type can be described as follows:
8+
9+
```typescript
10+
Context: {
11+
// Dataset properties
12+
dataset: {
13+
dataset_description: object
14+
datatypes: string[]
15+
modalities: string[]
16+
// Lists of subjects as discovered in different locations
17+
subjects: {
18+
sub_dirs: string[]
19+
participant_id: string[]
20+
phenotype: string[]
21+
}
22+
}
23+
24+
// Properties of the current subject
25+
subject: {
26+
// Lists of sessions as discovered in different locations
27+
sessions: {
28+
ses_dirs: string[]
29+
session_id: string[]
30+
phenotype: string[]
31+
}
32+
}
33+
34+
// Path properties
35+
path: string
36+
entities: object
37+
datatype: string
38+
suffix: string
39+
extension: string
40+
// Inferred property
41+
modality: string
42+
43+
// Inheritance principle constructions
44+
sidecar: object
45+
associations: {
46+
// Paths and properties of files associated with the current file
47+
aslcontext: { path: string, n_rows: integer, volume_type: string[] }
48+
...
49+
}
50+
51+
// Content properties
52+
size: integer
53+
54+
// File type-specific content properties
55+
columns: object
56+
gzip: object
57+
json: object
58+
nifti_header: object
59+
ome: object
60+
tiff: object
61+
}
62+
```
63+
64+
To take an example, in a minimal dataset containing only a single subject's T1-weighted image,
65+
the context for that image might be:
66+
67+
```yaml
68+
dataset:
69+
dataset_description:
70+
Name: "Example dataset"
71+
BIDSVersion: "1.10.0"
72+
DatasetType: "raw"
73+
datatypes: ["anat"]
74+
modalities: ["mri"]
75+
subjects:
76+
sub_dirs: ["sub-01"]
77+
participant_id: null
78+
phenotype: null
79+
80+
subject:
81+
sessions: { ses_dirs: null, session_id: null, phenotype: null }
82+
83+
path: "/sub-01/anat/sub-01_T1w.nii.gz"
84+
entities:
85+
subject: "01"
86+
datatype: "anat"
87+
suffix: "T1w"
88+
extension: ".nii.gz"
89+
modality: "mri"
90+
91+
sidecar:
92+
MagneticFieldStrength: 3
93+
...
94+
associations: {}
95+
96+
size: 22017017
97+
nifti_header:
98+
dim: 3
99+
voxel_sizes: [1, 1, 1]
100+
...
101+
```
102+
103+
Fields from this context can be queried using object dot notation.
104+
For example, `sidecar.MagneticFieldStrengh` has the integer value `3`,
105+
and `entities.subject` has the string value `"01"`.
106+
This permits the use of boolean expressions, such as
107+
`sidecar.RepetitionTime == nifti_header.pixdim[4]`.
108+
109+
As the validator validates each file in turn, it constructs a new context.
110+
The `dataset` property remains constant,
111+
while a new `subject` property is constructed when inspecting a new subject directory,
112+
and the remaining properties are constructed for each file, individually.
113+
114+
## Context definition
115+
116+
The validation context is largely dictated by the [schema],
117+
and the full type generated from the schema definition can be found in
118+
[jsr:@bids/schema/context](https://jsr.io/@bids/schema/doc/context/~/Context).
119+
120+
## Context construction
121+
122+
The construction of a validation context is where BIDS concepts are implemented.
123+
Again, this is easiest to explain with pseudocode:
124+
125+
```python
126+
def buildFileContext(dataset, file):
127+
context = namespace()
128+
context.dataset = dataset
129+
context.path = file.path
130+
context.size = file.size
131+
132+
fileParts = parsePath(file.path)
133+
context.entities = fileParts.entities
134+
context.datatype = fileParts.datatype
135+
context.suffix = fileParts.suffix
136+
context.extension = fileParts.extension
137+
138+
context.subject = buildSubjectContext(dataset, context.entities.subject)
139+
140+
context.sidecar = loadSidecar(file)
141+
context.associations = namespace({
142+
association: loadAssociation(file, association)
143+
for association in associationTypes(file)
144+
})
145+
146+
if isTSV(file):
147+
context.columns = loadColumns(file)
148+
if isNIfTI(file):
149+
context.nifti_header = loadNiftiHeader(file)
150+
... # And so on
151+
152+
return context
153+
```
154+
155+
The heavy lifting is done in `parsePath`, `loadSidecar` and `loadAssociation`.
156+
`parsePath` is relatively simple, but `loadSidecar` and `loadAssociation`
157+
implement the BIDS [Inheritance Principle].
158+
159+
[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle

docs/validation-model/index.md

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Validation model
2+
3+
The basic process of the BIDS validator operates according to the following
4+
[Python]-like pseudocode:
5+
6+
```python
7+
def validate(directory):
8+
fileTree = loadFileTree(directory)
9+
dataset = buildDatasetContext(fileTree)
10+
11+
for file in walk(dataset.fileTree):
12+
context = buildFileContext(dataset, file)
13+
for check in perFileChecks:
14+
check(context)
15+
16+
for check in datasetChecks:
17+
check(dataset)
18+
```
19+
20+
The following sections will describe the [the validation context](context.md)
21+
and our implementation of [the Inheritance Principle](inheritance-principle.md).
22+
23+
```{toctree}
24+
:maxdepth: 1
25+
:hidden:
26+
27+
context.md
28+
inheritance-principle.md
29+
```
30+
31+
[Python]: https://en.wikipedia.org/wiki/Python_(programming_language)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# The Inheritance Principle
2+
3+
The [Inheritance Principle] is a core concept in BIDS.
4+
Its original definition (edited for brevity) was:
5+
6+
> Any metadata file (`.json`, `.bvec`, `.tsv`, etc.) may be defined at any directory level,
7+
> but no more than one applicable file may be defined at a given level.
8+
> The values from the top level are inherited by all lower levels
9+
> unless they are overridden by a file at the lower level. [...]
10+
> There is no notion of "unsetting" a key/value pair.
11+
12+
Here, "top level" means dataset root, and "lower level" means closer to the data file
13+
the metadata applies to.
14+
More recent versions of the specification have made the language more precise at the cost
15+
of verbosity.
16+
The core concept remains the same.
17+
18+
The validator uses a "walk back" algorithm to find inherited files:
19+
20+
```python
21+
def walkBack(file, extension):
22+
fileParts = parsePath(file.path)
23+
24+
fileTree = file.parent
25+
while fileTree:
26+
for child in fileTree.children:
27+
parts = parsePath(child.path)
28+
if (
29+
parts.extension == extension
30+
and parts.suffix = fileParts.suffix
31+
and isSubset(parts.entities, fileParts.entities)
32+
):
33+
yield child
34+
35+
fileTree = fileTree.parent
36+
```
37+
38+
Using this basis, `loadSidecar` is simply:
39+
40+
```python
41+
def loadSidecar(file):
42+
sidecar = {}
43+
for json in walkBack(file, '.json'):
44+
# Order matters. `|` overrides the left side with the right.
45+
# Any collisions resolve in favor of closer to the data file.
46+
sidecar = loadJson(json) | sidecar
47+
return sidecar
48+
```
49+
50+
For `loadAssociation`, only the first match is used, if found:
51+
52+
```python
53+
def loadAssociation(file, association):
54+
for associated_file in walkBack(file, getExtension(association)):
55+
return getLoader(association)(associated_file)
56+
```
57+
58+
Each association contains different metadata to extract.
59+
Note that some associations have a different suffix from the files they associate to.
60+
The actual implementation of `walkBack` allows overriding suffixes as well as extensions,
61+
but it would not be instructive to show here.
62+
63+
[Inheritance Principle]: https://bids-specification.readthedocs.io/en/stable/common-principles.html#the-inheritance-principle

0 commit comments

Comments
 (0)