-
Notifications
You must be signed in to change notification settings - Fork 2
Output
Freya Arthen edited this page Mar 3, 2022
·
6 revisions
-
summary.txt
Lists the main assembly metrics (i.e. numbers as well as mean, SD, min and max of length, GC content and coverage) on contig and gene level -
raw_gene_table.txt|.csv
Contains all variables with values as they have been computed on the input data set (for detailed description of each variable, see this table -
imputed_gene_table.txt|.csv
The same as raw_gene_table.txt|.csv except for the variables 'c_genecovsd', 'c_genelensd', 'g_covdev_c', 'g_gcdev_c', 'g_lendev_c' which are rescaled to a range of 0 to 1 NaN (= missing) values are imputed with the mean of the respective variable
The label in the plots, which represents the query species, is automatically determined and always colored in a dark grey
-
3D_plot.html
Interactive 3D scatterplot to examine genes and their taxonomic assignments- with single-clicks on labels you can hide individual groups
- double-clicks hide every group except for the one that was clicked
- hovering over data points shows additional information
- the subdirectory “3D_plot_files” holds additional files for this plot and is required to display the plot (important when working with MobaXterm for example)
-
density_x|y|z.png|.pdf
Shows the density in the dot plot of axis x, y and z.- Note: the density of a single dot can’t be computed. Thus, groups of single genes can not be displayed in the 1D density plots
-
density_2d.png|.pdf
Like 2D scatterplot, but taxonomic group of query species is represented as 2d density -
gene_table_taxon_assignment.csv
raw_gene_table with PCA coordinates for each gene and their taxonomic assignment appended- this is a tabular representation of all information that is displayed in the 3D plot
- see Additional information for details on the contained information
-
variables_excluded_from_PCA_and_clustering.txt
Lists all variables that were excluded from PCS analysis due to containing more than 30% NaN values -
genes_excluded_from_PCA_and_clustering.csv
Genes still containing NaNs after dropping variables and thus being excluded from the analysis -
gene_table_coords.csv
'raw_gene_table' with PCA coordinates of genes appended (required for 'plotting.R')
-
contribution_of_variables.png|.pdf
Figure illustrating how much each variable contributes to the first two principal components -
genes_and_variables.png|.pdf
Biplot of variables (vectors) and genes (points) in the new coordinate system defined by the first two principal components. Transparency represents the amount of contribution to the principal components -
pca_loadings.csv
Table listing the loadings of the original variables (rows) on the computed principal components (columns) -
pca_summary.csv
Table listing standard deviation, proportion of explained variance and cumulative proportion of explained variance in the original data for each of the principal components -
scree_plot.png|.pdf
Scree plot visualising the amount of variance in the original data that is explained by each of the principal components (here: dimensions) -
parallel_analysis.png|.pdf
Only available if parallel analysis was performed on the principal components. Results of Horn’s parallel analysis: plotting random eigenvalues for the given number of PCs, adjusted and unadjusted eigenvalues, indicating which one were retained for the subsequent PCA
In addition to the PCA results, the script will output one directory for each clustering approach:
DBSCAN_clustering
hierarchical_clustering
k-means_clustering
model-based_clustering
For each of these runs, the following files will be output:
-
*.png|pdf
Genes plotted in the new coordinate system defined by the first two principal components. Colours indicating to which cluster the genes are assigned. -
*.taXaminer.csv
A table holding, for each gene, the raw_gene_table variables, together with additional columns, providing its new coordinates and another column, holding its cluster assignment. This is the taXaminer report that should be provided with each annotated assembly.
The directory genes_by_cluster will, for each run, hold as many files as there are clusters. Each file lists the names of all genes that have been assigned to this cluster. There is an option to incorporate .taXaminer.csv into the original GFF file to create annotations.with_taXaminer.<my_clustering>.gff which holds both annotation data and the taXaminer report (see sections Additional Scripts).