Sambamba
Supported Tool
Sambamba is a suite of programs written in the D Language for users to process high-throughput sequencing data.
Description
Sambamba is a suite of programs for users to quickly and efficiently process their high-throughput sequencing data. It is functionally similar to Samtools, but the source code is written in the D Language; it allows for faster performance while still being easy to use.
Supported commands:
markdup
markdup
This module parses key phrases in the output log files to find duplicate + unique reads and then calculates duplicate rate per sample. It will will work for both single and paired-end data. The absolute number of reads by type are displayed in a stacked bar plot, and duplicate rates are in the general statistics table.
Duplicate rates are calculated as follows:
Paired end
duplicate_rate = duplicateReads / (sortedEndPairs * 2 + singleEnds - singleUnmatchedPairs) * 100
Single end
duplicate_rate = duplicateReads / singleEnds * 100
If Sambamba Markdup is invoked using Snakemake, the following bare-bones rule should work fine:
File search patterns
sambamba/markdup:
contents: finding positions of the duplicate reads in the file
num_lines: 50