Skip to content
Geut Galai edited this page Mar 28, 2024 · 3 revisions

This project includes data for the bacterium of bovine rumen for 9 ranches across Europe. here will be described the data available and its format and metadata.

Raw Data

The raw data was extracted from a paper published by Wallace at al at 2019 (DOI: 10.1126/sciadv.aav8391), to be reprocessed in this project. The relevant files:


  • raw_data/full_taxa.tsv: phylogenetic metadata on the identified 16s that were sequences while collecting the data. The file is used to identify the sequences read as their clarified organism. Format (with a couple of examples):
seq16S Kingdom Phylum Class Order Family Genus Species
TACGCGCTAAAG... Bacteria Bacteroidota Bacteroidia Bacteroidales Prevotellaceae Prevotella NA
CGAAGCGTCGG... Archaea Euryarchaeota Methanobacteria Methanobacteriales Methanobacteriaceae Methanobrevibacter NA

  • raw_data/full_table.nochim.txt: Contains the number of reads each seq16S has in each of the cows in the study. The data is delimited by spaces. This data is used to calculate occurrence, richness and abundance for each organism. Format (with a couple of examples):
sample TACGCGCTAAAG... CGAAGCGTCGG... TACGCGCTTTTC... CGGAAAGTCGG... ...
ProkA_R1_FI900.fastq 0 486 0 1263 ...
ProkA_R1_FI901.fastq 4 223 0 544 ...

  • raw_data/RuminOmics_Animal_Phenotypes_for_Mizrahi_v2_plus_rt_quantification_with_total_20170921_and_depth.xlsx: This file contains all the metadata available on each of the cows participating in the study. It contains data in 3 sheets, each sheet with the same data but in a different format (the second sheet is for excel compatibility). It contains data on each cows ID and location, bread, lactation, diet, secretion and more (too long to detail here). As of now, it is mainly used to identify the cows. Note that the name of the farm is in its long version (NUDC/Franciosi/etc..) and not the short one used in the paper (UK1/IT2/etc..)


Processed Data

The main data files that were created and used as part of the study.

  • output/ASV_processed_data.csv: The file details the abundance of each ASV entity, in each cow, in a long format. Farm and country data is included as well. The format of the file:
Country Farm Cow_Code ASV_ID Abundance
UK NUDC UK161 ASV_00460 46
IT Franciosi IT641 ASV_00259 26

Note that here too, the farm name is in a long version.

  • output/core_ASV_[05/30/50].csv: The file contains the adundance data of core ASV entities, in a similar format to the one in the file mentioned above (output/ASV_processed_data.csv). Core ASV are defined as such if they appear in a certain percent of all the cows, correlating to the number on the file's name (5%/30%/50% of cows). File format:
Country Farm Cow_Code ASV_ID Abundance
UK UK1 UK161 ASV_00460 46
IT IT2 IT641 ASV_00259 26

Note that this time the farm name is in a short version.

  • HPC/exp_1/[experiment id]_[job id]_Farm_[farm name]_COOC.csv: The file contains the results of the co-occurrance analysis for each pair of nodes (ASVs) in a specific farm, be it significant or not. The data includes each ASV's name as well as serial number.
sp1 sp2 weight sp1_inc sp2_inc obs_cooccur exp_cooccur p_lt p_gt sp1_name sp2_name level level_name edge_type
1 1 2 0.473 184 88 87 87.5 0.47568 1 ASV_00001 ASV_00002 Farm IT1 not_significant
2 1 3 0.995 184 185 184 184 1 1 ASV_00001 ASV_00003 Farm IT1 not_significant

  • HPC/exp_1/[experiment id]_[job id]_Farm_[farm name]_edge_list.csv: The file contains theta for the edges of the networks that was built using only the significant values from the co-occurrence analysis (both positive and negative), in a format of edge list. These are only intra-layer edges. The nodes are identified by name (and not id).
from to weight edge_type level level_name
ASV_00001 ASV_00007 0.978 pos Farm IT1
ASV_00001 ASV_00035 0.978 pos Farm IT1

  • There are more network data files that return as output but as they only contain a subset of the data presented here or present the data in a different format, only the Most useful files are detailed here.
Clone this wiki locally