AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how...

29
AdapterRemoval Documentation Release 2.3.0 Mikkel Schubert; Stinus Lindgreen Nov 08, 2020

Transcript of AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how...

Page 1: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval DocumentationRelease 2.3.0

Mikkel Schubert; Stinus Lindgreen

Nov 08, 2020

Page 2: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode
Page 3: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

Contents:

1 Installation 31.1 Installation with Conda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Installing on Debian based systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Installing on OSX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Installing from sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Getting started 52.1 A note on specifying adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Example usage 73.1 Trimming single-end reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Trimming paired-end reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.3 Multiple input FASTQ files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.4 Interleaved FASTQ reads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.5 Combining FASTQ output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.6 Different quality score encodings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.7 Trimming paired-end reads with multiple adapter pairs . . . . . . . . . . . . . . . . . . . . . . . . . 93.8 Identifying adapter sequences from paired-ended reads . . . . . . . . . . . . . . . . . . . . . . . . . 103.9 Demultiplexing and adapter-trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.10 Demultiplexing mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 AdapterRemoval manpage 134.1 Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.3 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4 Window based quality trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.5 Exit status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.6 Reporting bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.7 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Miscellaneous 195.1 Window-based quality trimming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.2 Migrating from AdapterRemoval v1.x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6 Indices and tables 21

Index 23

i

Page 4: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

ii

Page 5: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

AdapterRemoval searches for and removes remnant adapter sequences from High-Throughput Sequencing (HTS) dataand (optionally) trims low quality bases from the 3’ end of reads following adapter removal. AdapterRemoval cananalyze both single end and paired end data, and can be used to merge overlapping paired-ended reads into (longer)consensus sequences. Additionally, Additionally, AdapterRemoval can construct a consensus adapter sequence forpaired-ended reads, if which this information is not available.

If you use AdapterRemoval v2, then please cite the paper:

Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification,and read merging. BMC Research Notes, 12;9(1):88 http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2

AdapterRemoval was originally published in Lindgreen 2012:

Lindgreen (2012): AdapterRemoval: Easy Cleaning of Next Generation Sequencing Reads, BMC Re-search Notes, 5:337 http://www.biomedcentral.com/1756-0500/5/337/

Contents: 1

Page 6: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

2 Contents:

Page 7: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

CHAPTER 1

Installation

1.1 Installation with Conda

If you have Conda installed on your system:

conda install -c bioconda adapterremoval

1.2 Installing on Debian based systems

Debian users on Stretch, Buster, or Sid, or using Jessie-backports, as well as Ubuntu users on Zesty or Artful, mayinstall AdapterRemoval using apt:

apt-get install adapterremoval

For other distributions, or to get the latest version of AdapteRemoval, please see the Installing from sources sectionbelow.

1.3 Installing on OSX

MacOSX users may install AdapterRemoval using Homebrew:

brew install homebrew/science/adapterremoval

Please see the Homebrew website for instructions on how to install and use Homebrew:

https://brew.sh/

3

Page 8: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

1.4 Installing from sources

Installing AdapterRemoval from sources requires the presence of libz and bz2 headers. On Debian based systems,these may be installed as follows:

sudo apt-get install zlib1g-dev libbz2-dev

In addtion, a C++11 compatible compiler and basic build-tools are required. On Debian based systems, these may beinstalled as follows:

sudo apt-get install build-essential

To compile AdapterRemoval, first download and unpack the newest release from GitHub, and then run the ‘make’command:

wget -O adapterremoval-2.3.1.tar.gz https://github.com/MikkelSchubert/adapterremoval/→˓archive/v2.3.1.tar.gztar xvzf adapterremoval-2.3.1.tar.gzcd adapterremoval-2.3.1make

The resulting ‘AdapterRemoval’ executable is located in the ‘build’ subdirectory, and can be run as-is. It is alsopossible to perform a system-wide installation of the AdapterRemoval executable, man-page, and examples using thefollowing command:

sudo make install

4 Chapter 1. Installation

Page 9: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

CHAPTER 2

Getting started

To run AdapterRemoval on single-end FASTQ data, simply specify the location of FASTQ file(s) using the --file1command-line options:

AdapterRemoval --file1 myreads_1.fastq.gz

To run AdapterRemoval on paired-end FASTQ data, specify the location of the mate 1 and mate 2 FASTQ files usingthe --file1 and --file2 command-line options:

AdapterRemoval --file1 myreads_1.fastq.gz --file2 myreads_2.fastq.gz

The files may be uncompressed, gzip-compressed, or bzip2 compressed. When run in this manner, AdapterRemovalwill save the trimmed reads in the current working directly, using filenames starting with ‘your_output’. This be-havior may be changed using the --basename option, or using specific options for each output file. See the in-put_and_output section for more information about files generated by AdapterRemoval.

More examples of common usage may be found in the Example usage section of the documentation.

2.1 A note on specifying adapters

AdapterRemoval relies on the user specifying the adapter sequences to be trimmed, using the --adapter1 and--adapter2 command-line options. By default, AdapterRemoval is setup to trim Illumina Truseq adapters, corre-sponding to the following command-line options:

--adapter1 AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG--adapter2 AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT

It is therefore extremely important to specify the correct adapter sequences when running AdapterRemoval on a datasetthat does not make use of these adapters. Failure to do so will result in the wrong sequences being trimmed, and actualadapter sequences being left in the resulting “trimmed” reads.

Adapter sequences are specified in the read orientation when using the --adapter1 and --adapter2 command-line options, directly corresponding to the sequence that is observed in the FASTQ files produced by the base calling

5

Page 10: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

software. If we were processing data generated using the above TrueSeq adapters, then we would therefore expect tofind those sequences as-is in our FASTQ files (assuming that the read lengths are sufficiently long and that insert sizesare sufficiently short), typically followed by a low-quality A-tail:

$ grep "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC......ATCTCGTATGCCGTCTTCTGCTTG" file1.fqAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAACAAGAATCTGGAGTTCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCGATGAATCTCGTATGCCGTCTTCTGCTTGCAAATTGAAAACAC

$ grep "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT" file2.fqCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTCAAAAAAAGAAAAACATCTTGGAACTCCAGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTCAAAAAAAATAGAGAACTAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTCAAAAACATAAGACCTA

The ambiguous bases representing the mate 1 barcode (the six Ns) have been replaced by single-character wildcards(dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters.

For paired-end data, the --identify-adaptersmode may be used to verify the choice of adapters, by attemptingto reconstruct the adapter sequence directly from the FASTQ reads. See the Example usage section for a demonstrationof this functionality.

6 Chapter 2. Getting started

Page 11: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

CHAPTER 3

Example usage

The following examples all make use of the data included in the ‘examples’ folder.

3.1 Trimming single-end reads

The following command removes adapters from the file reads_1.fq trims both Ns and low quality bases from the reads,and gzip compresses the resulting files. The --basename option is used to specify the prefix for output files:

AdapterRemoval --file1 reads_1.fq --basename output_single --trimns --trimqualities --→˓gzip

Since --gzip and --basename is specified, the trimmed FASTQ reads are written to output_single.truncated.gz,the discarded FASTQ reads are written to output_single.discarded.gz, and settings and summary statistics are writtento output_single.settings.

Note that by default, AdapterRemoval does not require a minimum number of bases overlapping with the adaptersequence, before reads are trimmed. This may result in an excess of very short (1 - 3 bp) 3’ fragments being falselyidentified as adapter sequences, and trimmed. This behavior may be changed using the --minadapteroverlapoption, which allows the specification of a minimum number of bases (excluding Ns) that must be aligned to carrytrimming. For example, use –minadapteroverlap 3 to require an overlap of at least 3 bp.

3.2 Trimming paired-end reads

The following command removes adapters from a paired-end reads, where the mate 1 and mate 2 reads are kept in filesreads_1.fq and reads_2.fq, respectively. The reads are trimmed for both Ns and low quality bases, and overlappingreads (at least 11 nucleotides, per default) are merged (collapsed):

AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_paired --→˓trimns --trimqualities --collapse

7

Page 12: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

This command generates the files output_paired.pair1.truncated and output_paired.pair2.truncated, which containtrimmed pairs of reads which were not collapsed, output_paired.singleton.truncated containing reads where one matewas discarded, output_paired.collapsed containing merged reads, and output_paired.collapsed.truncated containingmerged reads that have been trimmed due to the --trimns or --trimqualities options. Finally, the out-put_paired.discarded and output_paired.settings files correspond to those of the single-end run.

3.3 Multiple input FASTQ files

More than one input file may be specified for mate 1 and mate 2 reads. This is accomplished simply by listing morethan one file after the --file1 and the --file2 options.

For single-end reads:

AdapterRemoval --file1 reads_1a.fq reads_1b.fq reads_1c.fq

And for paired-end reads:

AdapterRemoval --file1 reads_1a.fq reads_1b.fq reads_1c.fq --file2 reads_2a.fq reads_→˓2b.fq reads_2c.fq

AdapterRemoval will process these files as if they had been concatenated into a single file or pair of files prior toinvoking AdapterRemoval. For paired reads, the files must be specified in the same order for --file1 and --file2.

3.4 Interleaved FASTQ reads

AdapterRemoval is able to read and write paired-end reads stored in a single, so-called interleaved FASTQ file (onepair at a time, first mate 1, then mate 2). This is accomplished by specifying the location of the file using --file1and also setting the --interleaved command-line option:

AdapterRemoval --interleaved --file1 interleaved.fq --basename output_interleaved

Other than taking just a single input file, this mode operates almost exactly like paired end trimming (as describedabove); the mode differs only in that paired reads are not written to a ‘pair1’ and a ‘pair2’ file, but instead these are in-stead written to a single, interleaved file, named ‘paired’. The location of this file is controlled using the --output1option. Enabling either reading or writing of interleaved FASTQ files, both not both, can be accomplished by specify-ing the either of the --interleaved-input and --interleaved-output options, both of which are enabledby the --interleaved option.

3.5 Combining FASTQ output

By default, AdapterRemoval will create one output file for each mate, one file for discarded reads, and (in PE mode)one file paired reads where one mate has been discarded, and (optionally) two files for collapsed reads. Alternatively,these files may be combined using the --combined-output, in which case all output is directed to the mate 1 and(in PE mode) to the mate 2 file. In cases where reads are discarded due to trimming to due to being collapsed into asingle sequence, the sequence and quality scores of the discarded read is replaced with a single ‘N’ with base-quality 0.This option may be combined with --interleaved / --interleaved-output, to write a single, interleavedfile in paired-end mode.

8 Chapter 3. Example usage

Page 13: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

3.6 Different quality score encodings

By default, AdapterRemoval expects the quality scores in FASTQ reads to be Phred+33 encoded, meaning that theerror probabilities are encoded as (char)(‘!’ - 10 * log10(p)). Most data will be encoded using Phred+33, but Phred+64and ‘Solexa’ encoded quality scores are also supported. These are selected by specifying the --qualitybasecommand-line option (specifying either ‘33’, ‘64’, or ‘solexa’):

AdapterRemoval --qualitybase 64 --file1 reads_q64.fq --basename output_phred_64

By default, reads are written using the same encoding as the input. If a different encoding is desired, this may beaccomplished using the --qualitybase-output option:

AdapterRemoval --qualitybase 64 --qualitybase-output 33 --file1 reads_q64.fq --→˓basename output_phred_33

Note furthermore that AdapterRemoval by default only expects quality scores in the range 0 - 41 (or -5 to 41 inthe case of Solexa encoded scores). If input data using a different maximum quality score is to be processed, or ifthe desired maximum quality score of collapsed reads is greater than 41, then this limit may be increased using the--qualitymax option:

AdapterRemoval --qualitymax 50 --file1 reads_1.fq --file2 reads_2.fq --collapse --→˓basename output_collapsed_q50

For a detailed overview of Phred encoding schemes currently and previously in use, see e.g. the Wikipedia article onthe subject: https://en.wikipedia.org/wiki/FASTQ_format#Encoding

3.7 Trimming paired-end reads with multiple adapter pairs

It is possible to trim data that contains multiple adapter pairs, by providing a one or two-column table containingpossible adapter combinations (for single-end and paired-end trimming, respectively; see e.g. examples/adapters.txt):

cat adapters.txtAGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTG→˓AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAACTTGCTCTGTGCCCGCTCCGTATGTCACAACAGTGCGTGTATCACCTCAATGCAGGACTCA→˓GATCGGGAGTAATTTGGAGGCAGTAGTTCGTCGAAACTCGGAGCGTCTTTAGCAGGAGCTAATTTGCCGTAGCGACGTACTTCAGCCTCCAGGAATTGGACCCTTACGCACACGCATTCATG→˓TACCGTGAAAGGTGCGCTTAGTGGCATATGCGTTAAGAGCTAGGTAACGGTCTGGAGGGTTCATACGACGACGACCAATGGCACACTTATCCGGTACTTGCGTTTCAATGCGCATGCCCCAT→˓TAAGAAACTCGGAGTTTGGCCTGCGAGGTAGCTTGGGTGTTATGAAGAACGGCATGCGCCATGCCCCGAAGATTCCTATACCCTTAAGGTCGCAATTGTTCGAGTAAGCTGTACGCGCCCAT→˓GTTGCATTGACCCGAAGGGCTCGATGTTTAGGGAGGTCAGAAGTTGAGCGGGTTCAAA

This table is then specified using the --adapter-list option:

AdapterRemoval --file1 reads_1.fq --file2 reads_2.fq --basename output_multi --trimns→˓--trimqualities --collapse --adapter-list adapters.txt

The resulting .summary file contains an overview of how frequently each adapter (pair) was used.

Note that in the case of paired-end adapters, AdapterRemoval considers only the combinations of adapters specifiedin the table, one combination per row. For single-end trimming, only the first column of the table file is required, andthe list may therefore take the form of a file containing one sequence per line.

3.6. Different quality score encodings 9

Page 14: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

3.8 Identifying adapter sequences from paired-ended reads

If we did not know the adapter sequences for the reads_.fq* files, AdapterRemoval may be used to generate a consensusadapter sequence based on fragments identified as belonging to the adapters through pairwise alignments of the reads,provided that the data set contains only a single adapter sequence (not counting differences in index sequences).

In the following example, the identified adapters corresponds to the default adapter sequences with a poly-A tailresulting from sequencing past the end of the insert + templates. It is not necessary to specify this tail when using the--adapter1 or --adapter2 command-line options. The characters shown under each of the consensus sequencesrepresented the phred-encoded fraction of bases identical to the consensus base, with adapter 1 containing the indexCACCTA:

AdapterRemoval --identify-adapters --file1 reads_1.fq --file2 reads_2.fq

Attemping to identify adapter sequences ...Processed a total of 1,000 reads in 0.0s; 129,000 reads per second on average ...

Found 394 overlapping pairs ...Of which 119 contained adapter sequence(s) ...

Printing adapter sequences, including poly-A tails:--adapter1: AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG

||||||||||||||||||||||||||||||||||******||||||||||||||||||||||||Consensus:

→˓AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCACCTAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAAAAAAAAAAAAAAQuality: 55200522544444/4411330333330222222/1.1.1.1111100-00000///..+....--*-)),

→˓,+++++++**(('%%%$

Top 5 most common 9-bp 5'-kmers:1: AGATCGGAA = 96.00% (96)2: AGATGGGAA = 1.00% (1)3: AGCTCGGAA = 1.00% (1)4: AGAGCGAAA = 1.00% (1)5: AGATCGGGA = 1.00% (1)

--adapter2: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Consensus:→˓AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Quality: 525555555144141441430333303.2/22-2/-1..11111110--00000///..+....--*-),,→˓,+++++++**(%'%%%$

Top 5 most common 9-bp 5'-kmers:1: AGATCGGAA = 100.00% (100)

No files are generated from running the adapter identification step.

The consensus sequences inferred are compared to those specified using the --adapter1 and --adapter2command-line options, or with the default values for these if no values have been given (as in this case). Pipes(|) indicate matches between the provided sequences and the consensus sequence, and “*” indicate the presence ofunspecified bases (Ns).

3.9 Demultiplexing and adapter-trimming

As of version 2.1, AdapterRemoval supports simultaneous demultiplexing and adapter trimming; demultiplexing iscarried out using a simple comparison between the specified barcode (a sequence of A, C, G, and T) and the first N

10 Chapter 3. Example usage

Page 15: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

bases of the mate 1 read, where N is the length of the barcode. Demultiplexing of double-indexed sequences is alsosupported, in which case two barcodes must be specified for each sample. The first barcode is then compared to firstN_1 bases of the mate 1 read, and the second barcode is compared to the first N_2 bases of the mate 2 read. By default,this comparison requires a perfect match. Reads identified as containing a specific barcode(s) are then trimmed usingadapter sequences including the barcode(s) as necessary. Reads for which no (pair of) barcodes matched are writtento a separate file or pair of files (for paired end reads).

Demultiplexing is enabled by creating a table of barcodes, the first column of which species the sample name (usingcharacters a-z, A-Z, 0-9, or _) and the second and (optional) third columns specifies the barcode sequences expectedat the 5’ termini of mate 1 and mate 2 reads, respectively.

For example, a table of barcodes from a double-indexed run might be as follows (see examples/barcodes.txt):

cat barcodes.txtsample_1 ATGCGGA TGAATCTsample_2 ATGGATT ATAGTGAsample_7 CAAAACT TCGCTGC

In the case of single-read reads, only the first two columns are required. AdapterRemoval is invoked with the--barcode-list option, specifying the path to this table:

AdapterRemoval --file1 demux_1.fq --file2 demux_2.fq --basename output_demux --→˓barcode-list barcodes.txt

This generates a set of output files for each sample specified in the barcode table, using the basename (--basename)as the prefix, followed by a dot and the sample name, followed by a dot and the default name for a given file type. Forexample, the output files for sample_2 would be

• output_demux.sample_2.discarded

• output_demux.sample_2.pair1.truncated

• output_demux.sample_2.pair2.truncated

• output_demux.sample_2.settings

• output_demux.sample_2.singleton.truncated

The settings files generated for each sample summarizes the reads for that sample only; in addition, a basename.settingsfile is generated which summarizes the number and proportion of reads identified as belonging to each sample.

The maximum number of mismatches allowed when comparing barocdes is controlled using the options--barcode-mm, --barcode-mm-r1, and --barcode-mm-r2, which specify the maximum number of mis-matches total, and the maximum number of mismatches for the mate 1 and mate 2 barcodes respectively. Thus, ifmm_1(i) and mm_2(i) represents the number of mismatches observed for barcode-pair i for a given pair of reads,these options require that

1. mm_1(i) <= --barcode-mm-r1

2. mm_2(i) <= --barcode-mm-r2

3. mm_1(i) + mm_2(i) <= --barcode-mm

3.10 Demultiplexing mode

As of version 2.2, AdapterRemoval can furthermore be used to demultiplex reads, without carrying out other forms ofadapter trimming. This is accomplished by specifying the --demultiplex-only option:

AdapterRemoval –file1 demux_1.fq –file2 demux_2.fq –basename output_only_demux –barcode-list bar-codes.txt –demultiplex-only

3.10. Demultiplexing mode 11

Page 16: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

Options listed under “TRIMMING SETTINGS” (see AdapterRemoval –help) do not apply to this mode, but compres-sion (--gzip, --bzip2), multi-threading (--threads), interleaving (--interleaved, etc.) and other suchoptions may be used in conjunction with --demultiplex-only.

AdapterRemoval will generate a .settings file for each sample listed in the --barcode-list file, along with theadapter-sequences that should be used when trimming reads for a given sample. These adapters correspond to theadapters that were specified when running AdapterRemoval in demultiplexing mode, with the barcode prefixed asappropriate. An underscore is used to demarcate the location at which the barcode ends and the adapter beings.

It is important to use these, updated, adapter sequences when trimming the demultiplexed reads, to avoid the inclusionof barcode sequences in reads extending past the 3’ termini of the DNA template sequence.

12 Chapter 3. Example usage

Page 17: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

CHAPTER 4

AdapterRemoval manpage

4.1 Synopsis

AdapterRemoval [options. . . ] –file1 <filenames> [–file2 <filenames>]

4.2 Description

AdapterRemoval removes residual adapter sequences from single-end (SE) or paired-end (PE) FASTQ reads, op-tionally trimming Ns and low qualities bases and/or collapsing overlapping paired-end mates into one read. Lowquality reads are filtered based on the resulting length and the number of ambigious nucleotides (‘N’) present fol-lowing trimming. These operations may be combined with simultaneous demultiplexing using 5’ barcode sequences.Alternatively, AdapterRemoval may attempt to reconstruct a consensus adapter sequences from paired-end data,in order to allow the identification of the adapter sequences originally used.

If you use this program, please cite the paper:

Schubert, Lindgreen, and Orlando (2016). AdapterRemoval v2: rapid adapter trimming, identification,and read merging. BMC Research Notes, 12;9(1):88

http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1900-2

For detailed documentation, please see

http://adapterremoval.readthedocs.io/en/v2.2.3/

4.3 Options

--helpDisplay summary of command-line options.

--versionPrint the version string.

13

Page 18: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

--file1 filename [filenames...]Read FASTQ reads from one or more files, either uncompressed, bzip2 compressed, or gzip compressed. Thiscontains either the single-end (SE) reads or, if paired-end, the mate 1 reads. If running in paired-end mode, both--file1 and --file2 must be set. See the primary documentation for a list of supported formats.

--file2 filename [filenames...]Read one or more FASTQ files containing mate 2 reads for a paired-end run. If specified, --file1 must alsobe set.

--identify-adaptersAttempt to build a consensus adapter sequence from fully overlapping pairs of paired-end reads. The minimumoverlap is controlled by --minalignmentlength. The result will be compared with the values set using--adapter1 and --adapter2. No trimming is performed in this mode. Default is off.

--threads nMaximum number of threads. Defaults to 1.

4.3.1 FASTQ options

--qualitybase baseThe Phred quality scores encoding used in input reads - either ‘64’ for Phred+64 (Illumina 1.3+ and 1.5+) or‘33’ for Phred+33 (Illumina 1.8+). In addition, the value ‘solexa’ may be used to specify reads with Solexaencoded scores. Default is 33.

--qualitybase-output baseThe base of the quality score for reads written by AdapterRemoval - either ‘64’ for Phred+64 (i.e., Illumina 1.3+and 1.5+) or ‘33’ for Phred+33 (Illumina 1.8+). In addition, the value ‘solexa’ may be used to specify readswith Solexa encoded scores. However, note that quality scores are represented using Phred scores internally,and conversion to and from Solexa scores therefore result in a loss of information. The default corresponds tothe value given for --qualitybase.

--qualitymax baseSpecifies the maximum Phred score expected in input files, and used when writing output files. Possible valuesare 0 to 93 for Phred+33 encoded files, and 0 to 62 for Phred+64 encoded files. Defaults to 41.

--mate-separator separatorCharacter separating the mate number (1 or 2) from the read name in FASTQ records. Defaults to ‘/’.

--interleavedEnables --interleaved-input and --interleaved-output.

--interleaved-inputIf set, input is expected to be a interleaved FASTQ files specified using --file1, in which pairs of reads arewritten one after the other (e.g. read1/1, read1/2, read2/1, read2/2, etc.).

--interleaved-ouputWrite paired-end reads to a single file, interleaving mate 1 and mate 2 reads. By default, this file is namedbasename.paired.truncated, but this may be changed using the --output1 option.

--combined-outputWrite all reads into the files specified by --output1 and --output2. The sequences of reads discarded dueto quality filters or read merging are replaced with a single ‘N’ with Phred score 0. This option can be combinedwith --interleaved-output to write PE reads to a single output file specified with --output1.

14 Chapter 4. AdapterRemoval manpage

Page 19: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

4.3.2 Output file options

--basename filenamePrefix used for the naming output files, unless these names have been overridden using the correspondingcommand-line option (see below).

--settings fileOutput file containing information on the parameters used in the run as well as overall statistics on the readsafter trimming. Default filename is ‘basename.settings’.

--output1 fileOutput file containing trimmed mate1 reads. Default filename is ‘basename.pair1.truncated’ for paired-endreads, ‘basename.truncated’ for single-end reads, and ‘basename.paired.truncated’ for interleaved paired-endreads.

--output2 fileOutput file containing trimmed mate 2 reads when --interleaved-output is not enabled. Default file-name is ‘basename.pair2.truncated’ in paired-end mode.

--singleton fileOutput file to which containing paired reads for which the mate has been discarded. Default filename is ‘base-name.singleton.truncated’.

--outputcollapsed fileIf –collapsed is set, contains overlapping mate-pairs which have been merged into a single read (PE mode) orreads for which the adapter was identified by a minimum overlap, indicating that the entire template moleculeis present. This does not include which have subsequently been trimmed due to low-quality or ambiguousnucleotides. Default filename is ‘basename.collapsed’

--outputcollapsedtruncated fileCollapsed reads (see –outputcollapsed) which were trimmed due the presence of low-quality or ambiguousnucleotides. Default filename is ‘basename.collapsed.truncated’.

--discarded fileContains reads discarded due to the –minlength, –maxlength or –maxns options. Default filename is ‘base-name.discarded’.

4.3.3 Output compression options

--gzipIf set, all FASTQ files written by AdapterRemoval will be gzip compressed using the compression level specifiedusing --gzip-level. The extension “.gz” is added to files for which no filename was given on the command-line. Defaults to off.

--gzip-level levelDetermines the compression level used when gzip’ing FASTQ files. Must be a value in the range 0 to 9, with 0disabling compression and 9 being the best compression. Defaults to 6.

--bzip2If set, all FASTQ files written by AdapterRemoval will be bzip2 compressed using the compression level spec-ified using --bzip2-level. The extension “.bz2” is added to files for which no filename was given on thecommand-line. Defaults to off.

--bzip2-level levelDetermines the compression level used when bzip2’ing FASTQ files. Must be a value in the range 1 to 9, with9 being the best compression. Defaults to 9.

4.3. Options 15

Page 20: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

4.3.4 FASTQ trimming options

--adapter1 adapterAdapter sequence expected to be found in mate 1 reads, specified in read direction. For a detailed de-scription of how to provide the appropriate adapter sequences, see the “Adapters” section of the online doc-umentation. Default is AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGC-CGTCTTCTGCTTG.

--adapter2 adapterAdapter sequence expected to be found in mate 2 reads, specified in read direction. For a detailed descriptionof how to provide the appropriate adapter sequences, see the “Adapters” section of the online documentation.Default is AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT.

--adapter-list filenameRead one or more adapter sequences from a table. The first two columns (separated by whitespace) of eachline in the file are expected to correspond to values passed to –adapter1 and –adapter2. In single-end mode,only column one is required. Lines starting with ‘#’ are ignored. When multiple rows are found in the ta-ble, AdapterRemoval will try each adapter (pair), and select the best aligning adapters for each FASTQ readprocessed.

--minadapteroverlap lengthIn single-end mode, reads are only trimmed if the overlap between read and the adapter is at least X baseslong, not counting ambiguous nucleotides (N); this is independent of the --minalignmentlength whenusing --collapse, allowing a conservative selection of putative complete inserts in single-end mode, whileensuring that all possible adapter contamination is trimmed. The default is 0.

--mm mismatchrateThe allowed fraction of mismatches allowed in the aligned region. If the value is less than 1, then the value isused directly. If `--mismatchrate is greater than 1, the rate is set to 1 / --mismatchrate. The defaultsetting is 3 when trimming adapters, corresponding to a maximum mismatch rate of 1/3, and 10 when using--identify-adapters.

--shift nTo allow for missing bases in the 5’ end of the read, the program can let the alignment slip --shift bases in the5’ end. This corresponds to starting the alignment maximum --shift nucleotides into read2 (for paired-end)or the adapter (for single-end). The default is 2.

--trim5p n [n]Trim the 5’ of reads by a fixed amount after removing adapters, but before carrying out quality based trimming.Specify one value to trim mate 1 and mate 2 reads the same amount, or two values separated by a space to trimeach mate different amounts. Off by default.

--trim3p n [n]Trim the 3’ of reads by a fixed amount. See --trim5p.

--trimnsTrim consecutive Ns from the 5’ and 3’ termini. If quality trimming is also enabled (--trimqualities),then stretches of mixed low-quality bases and/or Ns are trimmed.

--maxns nDiscard reads containing more than --max ambiguous bases (‘N’) after trimming. Default is 1000.

--trimqualitiesTrim consecutive stretches of low quality bases (threshold set by --minquality) from the 5’ and 3’ termini.If trimming of Ns is also enabled (--trimns), then stretches of mixed low-quality bases and Ns are trimmed.

--trimwindows window_sizeTrim low quality bases using a sliding window based approach inspired by sickle with the given windowsize. See the “Window based quality trimming” section of the manual page for a description of this algorithm.

16 Chapter 4. AdapterRemoval manpage

Page 21: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

--minquality minimumSet the threshold for trimming low quality bases using --trimqualities and --trimwindows. Defaultis 2.

--preserve5pIf set, bases at the 5p will not be trimmed by --trimns, --trimqualities, and --trimwindows.Collapsed reads will not be quality trimmed when this option is enabled.

--minlength lengthReads shorter than this length are discarded following trimming. Defaults to 15.

--maxlength lengthReads longer than this length are discarded following trimming. Defaults to 4294967295.

4.3.5 FASTQ merging options

--collapseIn paired-end mode, merge overlapping mates into a single and recalculate the quality scores. In single-endmode, attempt to identify templates for which the entire sequence is available. In both cases, complete “col-lapsed” reads are written with a ‘M_’ name prefix, and “collapsed” reads which are trimmed due to qualitysettings are written with a ‘MT_’ name prefix. The overlap needs to be at least --minalignmentlengthnucleotides, with a maximum number of mismatches determined by --mm.

--minalignmentlength lengthThe minimum overlap between mate 1 and mate 2 before the reads are collapsed into one, when collapsingpaired-end reads, or when attempting to identify complete template sequences in single-end mode. Default is11.

--seed seedWhen collaping reads at positions where the two reads differ, and the quality of the bases are identical, Adapter-Removal will select a random base. This option specifies the seed used for the random number generator usedby AdapterRemoval. This value is also written to the settings file. Note that setting the seed is not reliable inmultithreaded mode, since the order of operations is non-deterministic.

--deterministicEnable deterministic mode; currently only affects –collapse, different overlapping bases with equal quality areset to N quality 0, instead of being randomly sampled.

4.3.6 FASTQ demultiplexing options

--barcode-list filenamePerform demultiplxing using table of one or two fixed-length barcodes for SE or PE reads. The table is expectedto contain 2 or 3 columns, the first of which represent the name of a given sample, and the second and third ofwhich represent the mate 1 and (optionally) the mate 2 barcode sequence. For a detailed description, see the“Demultiplexing” section of the online documentation.

--barcode-mm nMaximum number of mismatches allowed when counting mismatches in both the mate 1 and the mate 2 barcode for paired reads.

--barcode-mm-r1 nMaximum number of mismatches allowed for the mate 1 barcode; if not set, this value is equal to the--barcode-mm value; cannot be higher than the --barcode-mm value.

--barcode-mm-r2 nMaximum number of mismatches allowed for the mate 2 barcode; if not set, this value is equal to the--barcode-mm value; cannot be higher than the --barcode-mm value.

4.3. Options 17

Page 22: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

--demultiplex-onlyOnly carry out demultiplexing using the list of barcodes supplied with –barcode-list. No other processing isdone.

4.4 Window based quality trimming

As of v2.2.2, AdapterRemoval implements sliding window based approach to quality based base-trimming inspiredby sickle. If window_size is greater than or equal to 1, that number is used as the window size for all reads. Ifwindow_size is a number greater than or equal to 0 and less than 1, then that number is multiplied by the length ofindividual reads to determine the window size. If the window length is zero or is greater than the current read length,then the read length is used instead.

Reads are trimmed as follows for a given window size:

1. The new 5’ is determined by locating the first window where both the average quality and the quality of the firstbase in the window is greater than --minquality.

2. The new 3’ is located by sliding the first window right, until the average quality becomes less than or equal to--minquality. The new 3’ is placed at the last base in that window where the quality is greater than or equalto --minquality.

3. If no 5’ position could be determined, the read is discarded.

4.5 Exit status

AdapterRemoval exists with status 0 if the program ran succesfully, and with a non-zero exit code if any errors wereencountered. Do not use the output from AdapterRemoval if the program returned a non-zero exit code!

4.6 Reporting bugs

Please report any bugs using the AdapterRemoval issue-tracker:

https://github.com/MikkelSchubert/adapterremoval/issues

4.7 License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General PublicLicense as published by the Free Software Foundation; either version 3 of the License, or at your option any laterversion.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even theimplied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU GeneralPublic License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.

18 Chapter 4. AdapterRemoval manpage

Page 23: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

CHAPTER 5

Miscellaneous

5.1 Window-based quality trimming

As of v2.2.2, AdapterRemoval implements sliding window based approach to quality based base-trimming inspiredby sickle. If window_size is greater than or equal to 1, that number is used as the window size for all reads. Ifwindow_size is a number greater than or equal to 0 and less than 1, then that number is multiplied by the length ofindividual reads to determine the window size. If the window length is zero or is greater than the current read length,then the read length is used instead.

Reads are trimmed as follows for a given window size:

1. The new 5’ is determined by locating the first window where both the average quality and the quality of the firstbase in the window is greater than --minquality.

2. The new 3’ is located by sliding the first window right, until the average quality becomes less than or equal to--minquality. The new 3’ is placed at the last base in that window where the quality is greater than or equalto --minquality.

3. If no 5’ position could be determined, the read is discarded.

5.2 Migrating from AdapterRemoval v1.x

Command-line options mostly behave the same between AdapterRemoval v1 and AdapterRemoval v2, and scriptswritten with AdapterRemoval v1.x in mind should work with AdapterRemoval v2.x. A notable exception is the--pcr1 and --pcr2 options, which have been replaced by the --adapter1 and --adapter2 options describedabove. While the --pcr options are still supported for backwards compatibility, these should not be used goingforward.

The difference between these two options is that --adapter2 expects the mate 2 adapter sequence to be specified inthe read orientation as described above, while the --pcr2 expects the sequence to be in the same orientation as themate 1 sequence, the reverse complement of the sequence observed in the mate 2 reads.

Using the common 13 bp Illumina adapter sequence (AGATCGGAAGAGC) as an example, this is how the optionswould be used in AdapterRemoval v2.x:

19

Page 24: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

AdapterRemoval --adapter1 AGATCGGAAGAGC --adapter2 AGATCGGAAGAGC ...

And in AdapterRemoval v1.x:

AdapterRemoval --adapter1 AGATCGGAAGAGC --adapter2 GCTCTTCCGATCT ...

20 Chapter 5. Miscellaneous

Page 25: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

CHAPTER 6

Indices and tables

• genindex

• modindex

• search

21

Page 26: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

22 Chapter 6. Indices and tables

Page 27: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

Index

Symbols-adapter-list filename

AdapterRemoval command line option,16

-adapter1 adapterAdapterRemoval command line option,

16-adapter2 adapter

AdapterRemoval command line option,16

-barcode-list filenameAdapterRemoval command line option,

17-barcode-mm n

AdapterRemoval command line option,17

-barcode-mm-r1 nAdapterRemoval command line option,

17-barcode-mm-r2 n

AdapterRemoval command line option,17

-basename filenameAdapterRemoval command line option,

15-bzip2

AdapterRemoval command line option,15

-bzip2-level levelAdapterRemoval command line option,

15-collapse

AdapterRemoval command line option,17

-combined-outputAdapterRemoval command line option,

14-demultiplex-only

AdapterRemoval command line option,

17-deterministic

AdapterRemoval command line option,17

-discarded fileAdapterRemoval command line option,

15-file1 filename [filenames...]

AdapterRemoval command line option,13

-file2 filename [filenames...]AdapterRemoval command line option,

14-gzip

AdapterRemoval command line option,15

-gzip-level levelAdapterRemoval command line option,

15-help

AdapterRemoval command line option,13

-identify-adaptersAdapterRemoval command line option,

14-interleaved

AdapterRemoval command line option,14

-interleaved-inputAdapterRemoval command line option,

14-interleaved-ouput

AdapterRemoval command line option,14

-mate-separator separatorAdapterRemoval command line option,

14-maxlength length

AdapterRemoval command line option,17

23

Page 28: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

-maxns nAdapterRemoval command line option,

16-minadapteroverlap length

AdapterRemoval command line option,16

-minalignmentlength lengthAdapterRemoval command line option,

17-minlength length

AdapterRemoval command line option,17

-minquality minimumAdapterRemoval command line option,

16-mm mismatchrate

AdapterRemoval command line option,16

-output1 fileAdapterRemoval command line option,

15-output2 file

AdapterRemoval command line option,15

-outputcollapsed fileAdapterRemoval command line option,

15-outputcollapsedtruncated file

AdapterRemoval command line option,15

-preserve5pAdapterRemoval command line option,

17-qualitybase base

AdapterRemoval command line option,14

-qualitybase-output baseAdapterRemoval command line option,

14-qualitymax base

AdapterRemoval command line option,14

-seed seedAdapterRemoval command line option,

17-settings file

AdapterRemoval command line option,15

-shift nAdapterRemoval command line option,

16-singleton file

AdapterRemoval command line option,15

-threads nAdapterRemoval command line option,

14-trim3p n [n]

AdapterRemoval command line option,16

-trim5p n [n]AdapterRemoval command line option,

16-trimns

AdapterRemoval command line option,16

-trimqualitiesAdapterRemoval command line option,

16-trimwindows window_size

AdapterRemoval command line option,16

-versionAdapterRemoval command line option,

13

AAdapterRemoval command line option

-adapter-list filename, 16-adapter1 adapter, 16-adapter2 adapter, 16-barcode-list filename, 17-barcode-mm n, 17-barcode-mm-r1 n, 17-barcode-mm-r2 n, 17-basename filename, 15-bzip2, 15-bzip2-level level, 15-collapse, 17-combined-output, 14-demultiplex-only, 17-deterministic, 17-discarded file, 15-file1 filename [filenames...], 13-file2 filename [filenames...], 14-gzip, 15-gzip-level level, 15-help, 13-identify-adapters, 14-interleaved, 14-interleaved-input, 14-interleaved-ouput, 14-mate-separator separator, 14-maxlength length, 17-maxns n, 16-minadapteroverlap length, 16-minalignmentlength length, 17-minlength length, 17

24 Index

Page 29: AdapterRemoval Documentation€¦ · (dots) in the above grep commands, corresponding to how AdapterRemoval itself treats such characters. For paired-end data, the --identify-adaptersmode

AdapterRemoval Documentation, Release 2.3.0

-minquality minimum, 16-mm mismatchrate, 16-output1 file, 15-output2 file, 15-outputcollapsed file, 15-outputcollapsedtruncated file, 15-preserve5p, 17-qualitybase base, 14-qualitybase-output base, 14-qualitymax base, 14-seed seed, 17-settings file, 15-shift n, 16-singleton file, 15-threads n, 14-trim3p n [n], 16-trim5p n [n], 16-trimns, 16-trimqualities, 16-trimwindows window_size, 16-version, 13Maximum number of mismatches

allowed when countingmismatches in both the mate1 and the mate 2 barcode forpaired reads., 17

MMaximum number of mismatches allowed

when counting mismatches inboth the mate 1 and the mate 2barcode for paired reads.

AdapterRemoval command line option,17

Index 25