There are several reasons why there isn't as much software written in Go for bioinformatics. The primary reason is historical — Languages like Python and R have historically been popular in the bioinformatics community due to their ease of use, extensive libraries, and strong community support. Many existing bioinformatics tools and workflows are already written in these languages, making it easier for new projects to build on top of existing solutions.
Despite all the challenges, Go's simplicity, performance, and concurrency support could make it a viable option for certain bioinformatics applications, especially those that require efficient multi-threading or that need to be deployed as standalone binaries. As Go continues to grow and gain traction in various industries, it's possible that we'll see an increase in the number of bioinformatics projects written in the language.
A quick search on the internet, especially on GitHub, has shown me that a lot of work has been done in this direction in the last decade. Starting from the classical "all-in-one bio package" biogo, which seems to have been abandoned in recent years, several enthusiasts continued to expand the code base and created more and more tools, mainly for molecular data manipulation.
I want to especially emphasize two developers and researchers: Wei Shen from the Institute for Viral Hepatitis, China and Brent Pedersen, a genomicist from the Netherlands — most of the libs and tools in this list were made by them. I haven't had enough time yet to check all the items on the list, so I anticipate that it will change (be extended or refined) in the near future.
Feel free to send a pull request, open an issue, or DM for changes or send me your pieces of code!
See also awesome-biology and Awesome-Bioinformatics.
- SciPipe — Workflow library embedded in the Go programming language, focusing on complex workflow constructs, compiling to a single binary, and providing powerful file naming and comprehensive audit reports for every output. [ paper-2019 | web ]
- Reflow — A language and runtime for distributed, incremental data processing in the cloud
- bíogo — Bioinformatics library for Go. (Updated 2 years ago!)
- Gonetics — Go/Golang Bioinformatics Library. (Updated 2 years ago!)
- Grail Bioinformatics tools — Bioinformatic infrastructure libraries. (Updated 3 years ago!)
- Philosopher — Complete toolkit for shotgun proteomics data analysis.
- samql — SQL-like query language for the SAM/BAM file format.
- SeqKit — Cross-platform and ultrafast toolkit for FASTA/Q file manipulation.
- KMCP — Accurate metagenomic profiling and fast large-scale sequence/genome searching. [ paper ]
- bio — Lightweight and high-performance bioinformatics package in Golang.
- unikmer — Toolkit for k-mers with taxonomic information.
- bwt — Burrows-Wheeler Transform and FM-index in Golang.
- gTaxon — Fast cross-platform NCBI taxonomy data querying tool with cmd client and REST API server.
- Gotranseq — Convert nucleic sequence to protein sequence.
- spexs2 — An exhaustive sequence pattern search tool
- smoove — Structural variant calling and genotyping with existing tools, but smoothly.
- vcfanno — Annotate a VCF with other VCFs/BEDs/tabixed files. [ paper ]
- vcfgo — Golang library to read, write, and manipulate files in the variant call format.
- goleft — Collection of bioinformatics tools distributed under the MIT license in a single static binary.
- excord — Extract SV signal from a BAM.
- bcf — BCF parsing in Golang.
- taxonkit — Practical and efficient NCBI Taxonomy Toolkit, supports creating NCBI-style taxdump files.
- bget — Portable command-line tool to query bioinformatics APIs, data, databases, and files.
- countminsketch — Implementation of Count-Min Sketch in Golang.
- csvtk — Cross-platform, efficient, and practical CSV/TSV toolkit in Golang.
- go-dicom — DICOM parser for Golang.
- GNfinder — finds scientific names in UTF8 texts, PDF files, MS Word/Excel documents, URLs, etc.
- goChem — A library for Computational Chemistry in the Go programming language