Skip to content

Commit 4737732

Browse files
Merge branch 'docs'
2 parents 994a6ad + 4a95d7b commit 4737732

File tree

2 files changed

+8
-28
lines changed

2 files changed

+8
-28
lines changed

Diff for: docs/user/input-files/03-genome-annotation.md

+1-24
Original file line numberDiff line numberDiff line change
@@ -20,33 +20,10 @@ The fundamental unit for Nextclade is a single `CDS`.
2020

2121
When a linked `gene` and `CDS` are present (`CDS`s specify their parents by listing the `gene`'s `ID` in the `Parent` attribute), the `gene` is effectively ignored for all purposes but display in the web UI. `CDS` segments are joined if they have the same `ID`, otherwise they are treated as independent.
2222

23-
Example gene map for SARS-CoV-2:
24-
25-
```
26-
# seqname source feature start end score strand frame attribute
27-
. . gene 266 21555 . + . gene=ORF1ab;ID=gene-ORF1ab
28-
. . CDS 266 13468 . + . gene=ORF1ab;ID=cds-ORF1ab;Parent=gene-ORF1ab
29-
. . CDS 13468 21555 . + . gene=ORF1ab;ID=cds-ORF1ab;Parent=gene-ORF1ab
30-
. . CDS 21563 25384 . + . gene=S
31-
. . CDS 25393 26220 . + . gene=ORF3a
32-
. . CDS 26245 26472 . + . gene=E
33-
. . CDS 26523 27191 . + . gene=M
34-
. . CDS 27202 27387 . + . gene=ORF6
35-
. . CDS 27394 27759 . + . gene=ORF7a
36-
. . CDS 27756 27887 . + . gene=ORF7b
37-
. . CDS 27894 28259 . + . gene=ORF8
38-
. . CDS 28284 28577 . + . gene=ORF9b
39-
. . CDS 28274 29533 . + . gene=N
40-
```
41-
42-
More example annotations can be found in the [Nextclade data repository](https://github.com/search?q=repo%3Anextstrain%2Fnextclade_data++path%3Agenome_annotation.gff3&type=code).
23+
Example annotations can be found in the [Nextclade data repository](https://github.com/search?q=repo%3Anextstrain%2Fnextclade_data%20path%3Adata%2F**%2F*.gff*&type=code).
4324

4425
Nextclade Web (advanced mode): accepted in "Genome annotation" drag & drop box.
4526

4627
Nextclade CLI flag: `--input-annotation`/`-m`
4728

48-
Note: For historical reasons, Nextclade uses _gene name_ when it really means _CDS_ name. The "gene name" is taken from the `CDS`'s first attribute found in the following list: `Gene`, `gene`, `gene_name`, `locus_tag`, `Name`, `name`, `Alias`, `alias`, `standard_name`, `old-name`, `product`, `gene_synonym`, `gb-synonym`, `acronym`, `gb-acronym`, `protein_id`, `ID`.
49-
50-
It is recommended that the `gene` attribute is used to specify the gene/CDS name.
51-
5229
> 💡 Nextclade CLI supports file compression and reading from standard input. See section [Compression, stdin](./compression.md) for more details.

Diff for: docs/user/input-files/04-reference-tree.md

+7-4
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,16 @@ Accepted formats: Auspice JSON v2 ([description](https://nextstrain.org/docs/bio
88

99
The phylogenetic reference tree which serves as a target for phylogenetic placement (see [Algorithm: Phylogenetic placement](../algorithm/03-phylogenetic-placement.md)). Nearest neighbor information is used to assign clades (see [Algorithm: Clade Assignment](../algorithm/04-clade-assignment.md)) and to identify private mutations, including reversions.
1010

11-
The tree **must** be rooted at the sample that matches the [reference sequence](../terminology.md#reference-sequence). A workaround in case one does not want to root the tree to be rooted on the reference is to attach the mutational differences between the tree root and the reference on the branch leading to the root node. This can be accomplished by passing the reference sequence to `augur ancestral`'s `--root-sequence` argument (see the [`augur ancestral` docs](https://docs.nextstrain.org/projects/augur/en/stable/usage/cli/ancestral.html#inputs)).
11+
> 💡 Nextclade CLI supports file compression and reading from standard input. See section [Compression, stdin](./compression) for more details.
1212
13-
The tree **must** contain a clade definition for every node (including internal): every node must have a value at `node_attrs.clade_membership` (although it can be an empty string).
13+
### Requirements
1414

15-
The tree **should** be sufficiently large and diverse to meet clade assignment expectations of a particular use-case, study or experiment. Only clades present on the reference tree can be assigned to [query sequences](../terminology.md#query-sequence).
15+
1. The tree **should** be rooted at the sample that matches the [reference sequence](02-reference-sequence.md). Otherwise the results of the analysis will be incorrect. It's user's or dataset author's responsibility that this assumption holds. Nextclade can sometimes detect a mismatch in certain cases, but not always.
1616

17-
> 💡 Nextclade CLI supports file compression and reading from standard input. See section [Compression, stdin](./compression) for more details.
17+
> ⚠️ A workaround in case one does not want the tree to be rooted on the reference is to attach the mutational differences between the tree root and the reference on the branch leading to the root node.
18+
> This can be accomplished by passing the reference sequence to `augur ancestral`'s `--root-sequence` argument (see the [`augur ancestral` docs](https://docs.nextstrain.org/projects/augur/en/stable/usage/cli/ancestral.html#inputs)).
19+
20+
2. The tree **should** be sufficiently large and diverse to meet clade assignment expectations of a particular use-case, study or experiment. Only clades present on the reference tree can be assigned to [query sequences](01-sequence-data.md).
1821

1922
### Extensions
2023

0 commit comments

Comments
 (0)