Tool
Tool Name
Description
Removes adapter sequences and trims low quality bases from the 3' end of reads. Overlapping paired-ended reads can be merged into consensus sequences and adapter sequence can be found for paired-ended data if not known.
Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data.
Anglerfish assesses contamination and composition of Illumina sequencing libraries based on a Nanopore trial sequencing with high concordance.
Rapid & standardized annotation of bacterial genomes, MAGs & plasmids.
Bamdst is a lightweight tool to stat the depth coverage of target regions of bam file(s).
BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
Tool for common data-quality-related trimming, filtering, and masking operations
BBMap is a suite of pre-processing, assembly, alignment, and statistics tools for DNA/RNA sequencing reads.
BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.
bcl2fastq can be used to both demultiplex data and convert BCL files to FASTQ file formats for downstream analysis.
bclconvert can be used to both demultiplex data and convert BCL files to FASTQ file formats for downstream analysis.
biobambam2 contains tools for processing BAM files for early stage alignment file processing
BioBloom Tools assigns reads to different references using bloom filters. This is faster than alignment and can be used for contamination detection.
BISCUIT is a software tool suite for analyzing bisulfite-converted DNA sequencing.
Bismark is a tool to map bisulfite converted sequence reads and determine cytosine methylation states.
Bowtie 1 is an ultrafast, memory-efficient short read aligner.
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample
BUSCO assesses genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs.
BUS format is a file format for single-cell RNA-seq data designed to facilitate the development of modular workflows for data processing.
CCS is a PacBio tool that generates highly accurate single-molecule consensus reads (HiFi Reads).
Summarise quality metrics from Cell Ranger count and vdj.
CheckQC is a program designed to check a set of quality criteria against an Illumina runfolder.
adapter clipping and read merging in ancient DNA analysis
Cluster Flow is a simple and flexible bioinformatics pipeline tool.
Conpair estimates concordance and contamination for tumour–normal pairs
Cutadapt is a tool to find and remove adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
DNA damage investigation tool for ancient DNA analysis
Improved Duplicate Removal for merged/collapsed reads in ancient DNA analysis
Tools to process and analyze deep sequencing data.
DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data.
Disambiguation algorithm for reads aligned to two species (e.g. human and mouse genomes) from Tophat, Hisat2, STAR or BWA mem.
Illumina Bio-IT Platform that uses FPGA for secondary NGS analysis.
Illumina Bio-IT Platform that uses FPGA for accelerated primary and secondary analysis
A set of tools to compare and manipulate the contents of EingenStrat databases, and to calculate SNP coverage statistics in such databases.
An ultra-fast all-in-one FASTQ preprocessor
FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.
FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.
featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations.
Fgbio can be used for processing and evaluating data containing UMIs
Filtlong is a tool for filtering long reads by quality.
FLASH (Fast Length Adjustment of SHort reads) is a very fast and accurate software tool to merge paired-end reads from NGS data.
Flexible barcode and adapter removal
Freyja: Recover relative lineage abundances from mixed SARS-CoV-2 samples.
Variant Discovery in High-Throughput Sequencing Data
A tool to compare, merge and annotate one or more GFF files with a reference annotation in GFF format.
Quickly estimate coverage from a whole-genome bam index, providing 16KB resolution. This is useful as a quick QC to get coverage values across the genome.
GoPeaks is used to call peaks in CUT&TAG/CUT&RUN datasets.
Hap.py is a set of programs based on htslib to benchmark variant calls against gold standard truth datasets. Som.py output not currently supported.
HiCexplorer addresses the common tasks of Hi-C analysis from processing to visualization.
HiC-Pro is an optimized and flexible pipeline for Hi-C data processing.
HiCUP (Hi-C User Pipeline) is a tool for mapping and performing quality control on Hi-C data.
A haplotype-resolved assembler for accurate Hifi reads
HISAT2 is a fast and sensitive alignment program for mapping NGS reads (both DNA and RNA) to reference genomes.
HOMER is a suite of tools for Motif Discovery and next-gen sequencing analysis.
This tool performs screening of output from the ancient DNA optimised BLAST-replacement tool MALT, to identify taxa that have expected ancient DNA characteristics.
Hostile removes host sequences from short and long read (meta)genomes, from paired or unpaired fastq[.gz] input.
HTSeq is a Python package that provides infrastructure to process data from high-throughput sequencing assays. HTSeq-count takes a file with aligned sequencing reads, plus a list of genomic features and counts how many reads map to each feature.
High-performance UMI deduplicator
The Illumina InterOp libraries are a set of common routines used for reading and writing InterOp metric files. These metric files are binary files produced during a run providing detailed statistics about a run. In a few cases, the metric files are produced after a run during secondary analysis (index metrics) or for faster display of a subset of the original data (collapsed quality scores).
Iso-Seq contains the newest tools to identify transcripts in PacBio single-molecule sequencing data (HiFi reads).
Functions for viral amplicon-based sequencing.
A tool to compute statistics on genome annotation.
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA.
Fast and sensitive taxonomic classification for metagenomics
kallisto is a program for quantifying abundances of transcripts from RNA-Seq data.
The K-mer Analysis Toolkit (KAT) contains a number of tools that analyse and compare K-mer spectra.
is a taxonomic classification tool that uses exact k-mer matches to find the lowest common ancestor (LCA) of a given sequence.
leeHom is a program for the Bayesian reconstruction of ancient DNA
A tool to predict the sequencing library type from the base composition of a supplied FastQ file.
Lima, the PacBio barcode demultiplexer, is the standard tool to identify barcode sequences in PacBio single-molecule sequencing data. Starting in SMRT Link v5.1.0, it is the tool that powers the Demultiplex Barcodes GUI-based analysis application.
A set of analysis pipelines that perform sample demultiplexing, barcode processing, alignment, quality control, variant calling, phasing, and structural variant calling.
MACS2 identifies transcription factor binding sites in ChIP-seq data.
mapDamage: tracking and quantifying damage patterns in ancient DNA sequences
ultra-fast and memory-efficient NGS assembler
MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
methylQA is a methylation sequencing data quality assessment tool.
Quality control for long reads from ONT (Oxford Nanopore Technologies) sequencing.
Command line tool to annotate miRNAs with a standard mirna/isomir naming
miRTrace, developed by the team of Marc Friedländer (KTH, Sweden), is a quality control software for small RNA sequencing data.
Mosdepth performs fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing.
Microbial profiling through marker gene (MG)-based operational taxonomic units (mOTUs)
A simple tool to compute mitochondrial to nuclear genome ratios.
MultiVCFAnalyzer collects multiple VCF files and outputs combined genotype calls in a number of file formats.
Calculate various statistics from a long read sequencing dataset in FastQ, BAM or albacore sequencing summary format (supports NanoPack; NanoPlot, NanoComp).
Viral genome alignment, clade assignment, mutation calling, and quality checks
finds information about sequencing libraries by backwards computing sequencing data.
Estimate metagenomic coverage and sequence diversity.
is an optimized dynamic graph/genome implementation.
Precision HLA typing from next-generation sequencing data
Pangolin uses variant calls to assign SARS-CoV-2 genome sequences to global lineages.
pbmarkdup takes one or multiple sequencing chips of an amplified libray as HiFi reads and marks or removes duplicates.
Peddy calculates genotype :: pedigree correspondence checks, ancestry checks and sex checks using VCF files.
Computes enrichment and quality measures for ChIP-seq/DNase-seq/FAIRE-seq/MNase-seq data.
Picard is a set of Java command line tools for manipulating high-throughput sequencing data.
Porechop is a tool for finding and removing adapters from Oxford Nanopore reads. Adapters on the ends of reads are trimmed off, and when a read has an adapter in its middle, it is treated as chimeric and chopped into separate reads. Porechop performs thorough alignments to effectively find adapters, even at low sequence identity.
Preseq estimates the complexity of a library, showing how many additional unique reads are sequenced for increasing total read count.
PRINSEQ++ is a C++ implementation of the prinseq-lite.pl program.
Prokka is a software tool for the rapid annotation of prokaryotic genomes.
A purity, ploidy and copy number estimator for whole genome tumor data
is a tool to identify, orient and trim full-length Nanopore cDNA reads. The tool is also able to rescue fused reads.
PycoQC computes metrics and generates interactive QC plots for Oxford Nanopore technologies sequencing data
Reference-free quality control for Hi-C DNA sequencing libraries
QoRTs is a fast, efficient, and portable toolkit designed to assist in the analysis, QC and data management of RNA-Seq datasets.
Qualimap is a platform-independent application to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
A Quality Assessment Tool for Genome Assemblies by the Center for Algorithmic Biotechnology.
Fast, efficient RNA-Seq metrics for quality control and process optimization
Rockhopper is a comprehensive and user-friendly system for computational analysis of bacterial RNA-seq data.
RSEM (RNA-Seq by Expectation-Maximization) is a software package for estimating gene and isoform expression levels from RNA-Seq data.
RSeQC is a package that provides a number of useful modules that can comprehensively evaluate high throughput RNA-seq data.
Salmon is a tool for quantifying the expression of transcripts using RNA-seq data.
Sambamba is a suite of programs written in the D Language for users to process high-throughput sequencing data.
Samblaster is a tool to mark duplicates and extract discordant and split reads from sam files.
Samtools is a suite of programs for interacting with high-throughput sequencing data.
Sargasso is a tool to separate mixed-species RNA-seq reads according to their species of origin.
Reports statistics generated by the Seqera Platform CLI.
Sequali is a sequencing data quality control tool suitable for both long-read and short-read data. It features adapter search, overrepresented sequence analysis and duplication analysis and supports FASTQ and uBAM inputs.
SeqWho is a reliable and extremely rapid program designed to determine a FASTQ(A) sequencing file identity, both source protocol and species of origin.
SeqyClean is a comprehensive preprocessing software application for NGS reads.
A python script to calculate the relative coverage of X and Y chromosomes, and their associated error bars, from the depth of coverage at specified SNPs.
Windowed Adaptive Trimming for FastQ files using quality
Skewer is an adapter trimming tool specially designed for processing next-generation sequencing (NGS) paired-end sequences.
Slamdunk is a tool to analyze SLAM-Seq data.
Rapid haploid variant calling and core genome alignment.
SnpEff is a genetic variant annotation and effect prediction toolbox. It annotates and predicts the effects of variants on genes (such as amino acid changes).
SNPsplit is an allele-specific alignment sorter, which is designed to read in alignment files in SAM/BAM format and determine the allelic origin of reads that cover known SNP positions.
Somalier does fast genotype :: pedigree correspondence checks from BAM/CRAM/VCF
SortMeRNA is a program tool for filtering, mapping and OTU-picking NGS reads in metatranscriptomic and metagenomic data.
Quickly search, compare, and analyze genomic and metagenomic data sets.
Summarise quality metrics from 10x Genomics Space Ranger count.
Stacks is a software for analyzing restriction enzyme-based data (e.g. RAD-seq)
STAR is an ultrafast universal RNA-seq aligner.
Supernova is a de novo genome assembler for 10X Genomics linked-reads.
THeTA2 estimates tumour purity and clonal / subclonal copy number.
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes.
Trimmomatic is a flexible read trimming tool for Illumina NGS data
Truvari is a toolkit for benchmarking, merging, and annotating structural variants
UMI-tools contains tools for dealing with Unique Molecular Identifiers (UMIs) / Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.
Variant detection in massively parallel sequencing data
VCFTools is a program for working with and reporting on VCF files.
Ensembl VEP determines the effect of your variants on genes, transcripts and protein sequences, as well as regulatory regions.
VerifyBamID checks whether reads match known genotypes or are contaminated as a mixture of two samples.
WhatsHap is a software for phasing genomic variants using DNA sequencing reads, also called read-based phasing or haplotype assembly. It is especially suitable for long reads, but works also well with short reads.
Fast xenograft read sorter based on space-efficient k-mer hashing
Xenome is a tool for classifying reads from xenograft sources.