Skip to content

Commit 459e834

Browse files
m-jahngithub-actions[bot]johanneskoestercoderabbitai[bot]
authored
feat: various changes to make building of catalog working again (#30)
* fix: attempt to update GH actions workflow for building catalog * feat: added some basic instructions based on catalog landing page * fix: try to fix env availability * Add changes * fix: added pull step in attempt to fix failureto push to non-updated remote * Add changes * fix: added pull step attempt 2 * fix: added pull step attempt 3 * Add changes * Add changes * fix: restored original runner chunks (10 x 100) * fix: minor improvements to doc, added badges * fix: issue with autostashing intermittent changes * Add changes * Add changes * fix: added problematic repos flagged by GH push protection * Add changes * Add changes * Add changes * Add changes * Add changes * Add changes * Add changes * Add changes * fix: add filter argument to untar as suggested by warning * fix: prevent possible exposure of tokens from snakemake --lint * fix: added parameter to search only for repos that had recent updates * fix: full update of all repos * Add changes * Add changes * Add changes * Add changes * Add changes * Update scripts/generate-catalog.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * remove duplicate import --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Johannes Köster <[email protected]> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent f63cd80 commit 459e834

File tree

9 files changed

+23143
-19724
lines changed

9 files changed

+23143
-19724
lines changed

.github/workflows/generate.yml

+23-13
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@ name: Generate catalog
22

33
on:
44
schedule:
5-
- cron: 0 5 * * *
5+
- cron: 0 5 * * 1
66
push:
77
branches:
88
- main
9+
- dev
910

1011
jobs:
1112
generate-catalog:
@@ -16,25 +17,30 @@ jobs:
1617
max-parallel: 1
1718
steps:
1819
- uses: actions/checkout@v4
19-
with:
20-
ref: main
21-
20+
2221
- name: deployment
23-
uses: mamba-org/provision-with-micromamba@main
22+
uses: mamba-org/setup-micromamba@v2
23+
with:
24+
environment-file: environment.yml
25+
26+
- name: Pull latest changes
27+
run: |
28+
git pull --rebase origin ${{ github.ref }}
2429
2530
- name: generate-catalog
2631
shell: bash -l {0}
2732
env:
2833
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
2934
OFFSET: ${{ matrix.offset }}
35+
LATEST_COMMIT: 7
3036
run: |
3137
python scripts/generate-catalog.py
3238
3339
- name: Commit files
3440
run: |
3541
git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com"
3642
git config --local user.name "github-actions[bot]"
37-
git commit -m "Add changes" -a
43+
git commit -m "Add changes" -a || echo "No changes to commit"
3844
3945
- name: Push changes
4046
uses: ad-m/github-push-action@master
@@ -51,13 +57,17 @@ jobs:
5157
max-parallel: 1
5258
steps:
5359
- uses: actions/checkout@v4
54-
with:
55-
ref: main
56-
60+
5761
- name: deployment
58-
uses: mamba-org/provision-with-micromamba@main
62+
uses: mamba-org/setup-micromamba@v2
63+
with:
64+
environment-file: environment.yml
5965

60-
- name: generate-catalog
66+
- name: Pull latest changes
67+
run: |
68+
git pull --rebase origin ${{ github.ref }}
69+
70+
- name: cleanup-catalog
6171
shell: bash -l {0}
6272
env:
6373
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
@@ -69,10 +79,10 @@ jobs:
6979
run: |
7080
git config --local user.email "41898282+github-actions[bot]@users.noreply.github.com"
7181
git config --local user.name "github-actions[bot]"
72-
git commit -m "Add changes" -a
82+
git commit -m "Add changes" -a || echo "No changes to commit"
7383
7484
- name: Push changes
7585
uses: ad-m/github-push-action@master
7686
with:
7787
github_token: ${{ secrets.GITHUB_TOKEN }}
78-
branch: ${{ github.ref }}
88+
branch: ${{ github.ref }}

.github/workflows/test-repo.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ jobs:
5454
Results: https://github.com/snakemake/snakemake-workflow-catalog/actions/workflows/test-repo.yml
5555
- uses: actions/checkout@v1
5656
- name: deployment
57-
uses: mamba-org/provision-with-micromamba@main
57+
uses: mamba-org/setup-micromamba@v2
5858
- name: generate-catalog
5959
shell: bash -l {0}
6060
env:

README.md

+78
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,80 @@
11
# snakemake-workflow-catalog
2+
3+
[![Generate catalog](https://github.com/snakemake/snakemake-workflow-catalog/actions/workflows/generate.yml/badge.svg)](https://github.com/snakemake/snakemake-workflow-catalog/actions/workflows/generate.yml)
4+
[![pages-build-deployment](https://github.com/snakemake/snakemake-workflow-catalog/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/snakemake/snakemake-workflow-catalog/actions/workflows/pages/pages-build-deployment)
5+
![GitHub last commit](https://img.shields.io/github/last-commit/snakemake/snakemake-workflow-catalog?label=latest%20update)
6+
![GitHub Issues or Pull Requests](https://img.shields.io/github/issues/snakemake/snakemake-workflow-catalog)
7+
28
A statically generated catalog of available Snakemake workflows
9+
10+
This repository serves as a centralized collection of workflows designed to facilitate reproducible and scalable data analyses using the [**Snakemake**](https://snakemake.github.io/) workflow management system.
11+
12+
## Purpose
13+
14+
The Snakemake Workflow Catalog aims to provide a regularly updated list of high-quality workflows that can be easily reused and adapted for various data analysis tasks. By leveraging the power of [**Snakemake**](https://snakemake.github.io/), these workflows promote:
15+
16+
- Reproducibility: Snakemake workflows produce consistent results, making it easier to share and validate scientific findings.
17+
- Scalability: Snakemake workflows can be executed on various computing environments, from local machines to high-performance computing clusters and cloud services.
18+
- Modularity: Snakemake workflows are structured to allow easy customization and extension, enabling users to adapt them to their specific needs.
19+
20+
## Workflows
21+
22+
Workflows are automatically added to the Workflow Catalog. This is done by regularly searching Github repositories for matching workflow structures. The catalog includes workflows based on the following criteria.
23+
24+
### All workflows
25+
26+
- The workflow is contained in a public Github repository.
27+
- The repository has a `README.md` file, containing the words "snakemake" and "workflow" (case insensitive).
28+
- The repository contains a workflow definition named either `Snakefile` or `workflow/Snakefile`.
29+
- If the repository contains a folder `rules` or `workflow/rules`, that folder must at least contain one file ending on `.smk`.
30+
- The repository is small enough to be cloned into a [Github Actions](https://docs.github.com/en/actions/about-github-actions/understanding-github-actions) job (very large files should be handled via [Git LFS](https://docs.github.com/en/repositories/working-with-files/managing-large-files), so that they can be stripped out during cloning).
31+
- The repository is not blacklisted here.
32+
33+
### Standardized usage workflows
34+
35+
In order to additionally appear in the "standardized usage" area, repositories additionally have to:
36+
37+
- have their main workflow definition named `workflow/Snakefile` (unlike for plain inclusion (see above), which also allows just `Snakefile` in the root of the repository),
38+
- provide configuration instructions under `config/README.md`
39+
- contain a `YAML` file `.snakemake-workflow-catalog.yml` in their root directory, which configures the usage instructions displayed by this workflow catalog.
40+
41+
Typical content of the `.snakemake-workflow-catalog.yml` file:
42+
43+
```bash
44+
usage:
45+
mandatory-flags: # optional definition of additional flags
46+
desc: # describe your flags here in a few sentences (they will be inserted below the example commands)
47+
flags: # put your flags here
48+
software-stack-deployment: # definition of software deployment method (at least one of conda, singularity, or singularity+conda)
49+
conda: true # whether pipeline works with --use-conda
50+
singularity: true # whether pipeline works with --use-singularity
51+
singularity+conda: true # whether pipeline works with --use-singularity --use-conda
52+
report: true # add this to confirm that the workflow allows to use 'snakemake --report report.zip' to generate a report containing all results and explanations
53+
```
54+
55+
Once included in the standardized usage area you can link directly to the usage instructions for your repository via the URL `https://snakemake.github.io/snakemake-workflow-catalog?usage=<owner>/<repo>`.
56+
57+
### Release handling
58+
59+
If your workflow provides Github releases, the catalog will always just scrape the latest non-preview release. Hence, in order to update your workflow's records here, you need to release a new version on Github.
60+
61+
## Contributing
62+
63+
Contributions to the Snakemake Workflow Catalog are welcome!
64+
Ideas can be discussed on the [catalog's Issues page](https://github.com/snakemake/snakemake-workflow-catalog/issues) first, and contributions made through Github Pull Requests.
65+
66+
## Using workflows from the catalog
67+
68+
To get started with a workflow from the catalog:
69+
70+
1. Clone the repository or download the specific workflow directory.
71+
2. Review the documentation provided with the workflow to understand its requirements and usage.
72+
3. Configure the workflow by editing the `config.yml` files as needed.
73+
4. Execute the workflow using Snakemake.
74+
75+
For more detailed instructions, please refer to the documentation within each workflow directory.
76+
77+
## License
78+
79+
The Snakemake Workflow Catalog is open-source and available under the MIT License.
80+
For more information and to explore the available workflows, visit https://snakemake.github.io/snakemake-workflow-catalog/.

blacklist.txt

+4-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
snakemake/snakemake-wrappers
22
snakemake/snakemake
3-
snakemake/snakemake-workflow-catalog
3+
snakemake/snakemake-workflow-catalog
4+
tdayris/fair_bowtie2_mapping
5+
tdayris/fair_fastqc_multiqc
6+
GiulioCentorame/FADS-by-breastfeeding

data.js

+21,070-18,325
Large diffs are not rendered by default.

environment.yml

+7-7
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@ channels:
33
- conda-forge
44
- bioconda
55
dependencies:
6-
- jinja2
7-
- pygithub
8-
- gitpython
9-
- snakemake
10-
- snakefmt
11-
- python >=3.9
12-
- ratelimit
6+
- jinja2=3.1.4
7+
- pygithub=2.5.0
8+
- gitpython=3.1.43
9+
- snakemake=8.25.5
10+
- snakefmt=0.10.2
11+
- python=3.12.8
12+
- ratelimit=2.2.1

scripts/common.py

+1-6
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,16 @@
11
import logging
2-
import tempfile
32
import subprocess as sp
43
import os
5-
from pathlib import Path
64
import json
75
import calendar
86
import time
9-
import urllib
10-
import tarfile
117

128
from ratelimit import limits, sleep_and_retry
139
from jinja2 import Environment
1410
from github import Github
1511
from github.ContentFile import ContentFile
1612
from github.GithubException import UnknownObjectException, RateLimitExceededException
17-
import git
1813
from jinja2 import Environment, FileSystemLoader, select_autoescape
19-
import yaml
2014

2115
logging.basicConfig(level=logging.INFO)
2216

@@ -48,6 +42,7 @@
4842
.split()[-1]
4943
)
5044

45+
5146
def rate_limit_wait(api_type):
5247
curr_timestamp = calendar.timegm(time.gmtime())
5348
reset_timestamp = calendar.timegm(get_rate_limit(api_type).reset.timetuple())

scripts/generate-catalog.py

+36-12
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,32 @@
33
import subprocess as sp
44
import os
55
from pathlib import Path
6-
import json
76
import time
87
import urllib
98
import tarfile
9+
import re
10+
from datetime import timedelta, datetime
1011

11-
from jinja2 import Environment
1212
import git
1313
from jinja2 import Environment, FileSystemLoader, select_autoescape
1414
import yaml
1515

16-
from common import store_data, check_repo_exists, call_rate_limit_aware, g, previous_repos, previous_skips, blacklist, snakefmt_version, offset
16+
from common import (
17+
store_data,
18+
call_rate_limit_aware,
19+
g,
20+
previous_repos,
21+
previous_skips,
22+
blacklist,
23+
snakefmt_version,
24+
offset,
25+
)
1726

1827
logging.basicConfig(level=logging.INFO)
1928

2029
test_repo = os.environ.get("TEST_REPO")
21-
offset = int(offset / 100 * 1000)
30+
latest_commit = int(os.environ.get("LATEST_COMMIT"))
31+
offset = int(offset * 10)
2232

2333
env = Environment(
2434
autoescape=select_autoescape(["html"]), loader=FileSystemLoader("templates")
@@ -96,8 +106,11 @@ def __init__(
96106
total_count = 1
97107
offset = 0
98108
else:
109+
date_threshold = datetime.today() - timedelta(latest_commit)
110+
date_threshold = datetime.strftime(date_threshold, "%Y-%m-%d")
99111
repo_search = g.search_repositories(
100-
"snakemake workflow in:readme archived:false", sort="updated"
112+
f"snakemake workflow in:readme archived:false pushed:>={date_threshold}",
113+
sort="updated",
101114
)
102115
time.sleep(5)
103116
total_count = call_rate_limit_aware(
@@ -170,7 +183,7 @@ def __init__(
170183
fileobj=urllib.request.urlopen(tarball_url), mode="r|gz"
171184
)
172185
root_dir = get_tarfile().getmembers()[0].name
173-
get_tarfile().extractall(path=tmp)
186+
get_tarfile().extractall(path=tmp, filter="tar")
174187
tmp /= root_dir
175188
else:
176189
# no latest release, clone main branch
@@ -195,7 +208,8 @@ def __init__(
195208

196209
if rules.exists() and rules.is_dir():
197210
if not any(
198-
rule_file.suffix == ".smk" for rule_file in rules.iterdir()
211+
rule_file.suffix == ".smk"
212+
for rule_file in rules.iterdir()
199213
if rule_file.is_file()
200214
):
201215
log_skip("rule modules are not using .smk extension")
@@ -233,6 +247,7 @@ def __init__(
233247
)
234248
except sp.CalledProcessError as e:
235249
linting = e.stderr.decode()
250+
linting = re.sub("gh[pousr]\\_[a-zA-Z0-9_]{36}@?", "", linting)
236251
if test_repo is not None:
237252
logging.error(linting)
238253

@@ -252,15 +267,22 @@ def __init__(
252267
if test_repo is not None:
253268
logging.error(formatting)
254269

255-
topics = call_rate_limit_aware(
256-
repo.get_topics
257-
)
270+
topics = call_rate_limit_aware(repo.get_topics)
258271

259272
if config_readme is not None:
260273
config_readme = call_rate_limit_aware(lambda: g.render_markdown(config_readme))
261274

262275
repos.append(
263-
Repo(repo, linting, formatting, config_readme, settings, release, updated_at, topics).__dict__
276+
Repo(
277+
repo,
278+
linting,
279+
formatting,
280+
config_readme,
281+
settings,
282+
release,
283+
updated_at,
284+
topics,
285+
).__dict__
264286
)
265287

266288
if test_repo is None:
@@ -270,7 +292,9 @@ def __init__(
270292

271293
def add_old(old_repos, current_repos):
272294
visited = set(repo["full_name"] for repo in current_repos)
273-
current_repos.extend(repo for repo_name, repo in old_repos.items() if repo_name not in visited)
295+
current_repos.extend(
296+
repo for repo_name, repo in old_repos.items() if repo_name not in visited
297+
)
274298

275299
logging.info("Adding all old repos not covered by the current query.")
276300
add_old(previous_repos, repos)

0 commit comments

Comments
 (0)