The Data

This project includes data for the bacterium of bovine rumen for 9 ranches across Europe. here will be described the data available and its format and metadata.

Raw Data

The raw data was extracted from a paper published by Wallace at al at 2019 (DOI: 10.1126/sciadv.aav8391), to be reprocessed in this project. The relevant files:

raw_data/full_taxa.tsv: phylogenetic metadata on the identified 16s that were sequences while collecting the data. The file is used to identify the sequences read as their clarified organism. Format (with a couple of examples):

seq16S	Kingdom	Phylum	Class	Order	Family	Genus	Species
TACGCGCTAAAG...	Bacteria	Bacteroidota	Bacteroidia	Bacteroidales	Prevotellaceae	Prevotella	NA
CGAAGCGTCGG...	Archaea	Euryarchaeota	Methanobacteria	Methanobacteriales	Methanobacteriaceae	Methanobrevibacter	NA

raw_data/full_table.nochim.txt: Contains the number of reads each seq16S has in each of the cows in the study. The data is delimited by spaces. This data is used to calculate occurrence, richness and abundance for each organism. Format (with a couple of examples):

sample	TACGCGCTAAAG...	CGAAGCGTCGG...	TACGCGCTTTTC...	CGGAAAGTCGG...	...
ProkA_R1_FI900.fastq	0	486	0	1263	...
ProkA_R1_FI901.fastq	4	223	0	544	...

raw_data/RuminOmics_Animal_Phenotypes_for_Mizrahi_v2_plus_rt_quantification_with_total_20170921_and_depth.xlsx: This file contains all the metadata available on each of the cows participating in the study. It contains data in 3 sheets, each sheet with the same data but in a different format (the second sheet is for excel compatibility). It contains data on each cows ID and location, bread, lactation, diet, secretion and more (too long to detail here). As of now, it is mainly used to identify the cows. Note that the name of the farm is in its long version (NUDC/Franciosi/etc..) and not the short one used in the paper (UK1/IT2/etc..)

Processed Data

The main data files that were created and used as part of the study.

output/ASV_processed_data.csv: The file details the abundance of each ASV entity, in each cow, in a long format. Farm and country data is included as well. The format of the file:

Country	Farm	Cow_Code	ASV_ID	Abundance
UK	NUDC	UK161	ASV_00460	46
IT	Franciosi	IT641	ASV_00259	26

Note that here too, the farm name is in a long version.

output/core_ASV_[05/30/50].csv: The file contains the adundance data of core ASV entities, in a similar format to the one in the file mentioned above (output/ASV_processed_data.csv). Core ASV are defined as such if they appear in a certain percent of all the cows, correlating to the number on the file's name (5%/30%/50% of cows). File format:

Country	Farm	Cow_Code	ASV_ID	Abundance
UK	UK1	UK161	ASV_00460	46
IT	IT2	IT641	ASV_00259	26

Note that this time the farm name is in a short version.

HPC/exp_1/[experiment id]_[job id]_Farm_[farm name]_COOC.csv: The file contains the results of the co-occurrance analysis for each pair of nodes (ASVs) in a specific farm, be it significant or not. The data includes each ASV's name as well as serial number.

	sp1	sp2	weight	sp1_inc	sp2_inc	obs_cooccur	exp_cooccur	p_lt	p_gt	sp1_name	sp2_name	level	level_name	edge_type
1	1	2	0.473	184	88	87	87.5	0.47568	1	ASV_00001	ASV_00002	Farm	IT1	not_significant
2	1	3	0.995	184	185	184	184	1	1	ASV_00001	ASV_00003	Farm	IT1	not_significant

HPC/exp_1/[experiment id]_[job id]_Farm_[farm name]_edge_list.csv: The file contains theta for the edges of the networks that was built using only the significant values from the co-occurrence analysis (both positive and negative), in a format of edge list. These are only intra-layer edges. The nodes are identified by name (and not id).

from	to	weight	edge_type	level	level_name
ASV_00001	ASV_00007	0.978	pos	Farm	IT1
ASV_00001	ASV_00035	0.978	pos	Farm	IT1

There are more network data files that return as output but as they only contain a subset of the data presented here or present the data in a different format, only the Most useful files are detailed here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Data

Raw Data

Processed Data

Clone this wiki locally