Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes in pipeline to account for multiple assemblies #132

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions files/ftp-export/genome_coordinates/known-coordinates.sql
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ select distinct
assembly.ensembl_url,
assembly.taxid
from ensembl_assembly assembly
where assembly.selected_genome = true
) TO STDOUT CSV
1 change: 1 addition & 0 deletions files/genes/species.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@ COPY (
select
distinct assembly_id, taxid
from ensembl_assembly
where selected_genome = true
) TO STDOUT CSV
1 change: 1 addition & 0 deletions files/genome-mapping/find_species.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,5 @@ COPY (
FROM ensembl_assembly
WHERE
division NOT IN ('EnsemblProtists', 'EnsemblFungi')
AND selected_genome = true
) TO STDOUT CSV;
1 change: 1 addition & 0 deletions files/import-data/post-release/001__coordinate-systems.sql
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ SELECT
load.karyotype_rank
FROM load_coordinate_info load
JOIN ensembl_assembly ensembl ON ensembl.assembly_id = load.assembly_id
WHERE ensembl.selected_genome = true
)
ON CONFLICT (chromosome, assembly_id) DO UPDATE
SET
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ from load_ensembl_pseudogenes load
join ensembl_assembly assem
on
assem.assembly_id = load.assembly_id
where assem.selected_genome = true
) ON CONFLICT (md5(region_name)) DO NOTHING;

INSERT INTO ensembl_pseudogene_exons (
Expand Down
2 changes: 2 additions & 0 deletions files/import-data/post-release/001__locations.sql
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ on
assembly.assembly_id = load.assembly_id
WHERE
load.chromosome is not null
AND
assembly.selected_genome = true
ON CONFLICT (accession, name, local_start, local_end, assembly_id)
DO NOTHING
;
Expand Down
2 changes: 1 addition & 1 deletion files/repeats/find-assemblies.sql
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ FROM ensembl_assembly species
WHERE
exists(select 1 from rnc_sequence_regions reg where reg.assembly_id = species.assembly_id)
and species.division != 'EnsemblFungi'
and species.selected_genome = true
) TO STDOUT CSV;

2 changes: 1 addition & 1 deletion workflows/databases/mirgenedb.nf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ process mirgenedb {
"""
scp $params.databases.mirgenedb.remote mirgenedb.json
psql \
--command='COPY (select assembly_id,assembly_ucsc from ensembl_assembly where assembly_ucsc is not null) TO STDOUT (FORMAT CSV)' \
--command='COPY (select assembly_id,assembly_ucsc from ensembl_assembly where assembly_ucsc is not null and selected_genome = true) TO STDOUT (FORMAT CSV)' \
"$PGDATABASE" > assemblies.tsv
rnac mirgenedb parse assemblies.tsv mirgenedb.json .
"""
Expand Down