Read mapping

Map

Map single or paired-end short reads to one or multiple genomes in the pangenome. One SAM or BAM file is generated for each genome included in the analysis.

Parameters

<databaseDirectory>

Path to the database root directory.

<genomeNumbers>

A text file containing genome numbers to map reads against in each line.

<shortReadFiles>

One or two short-read archives in FASTQ format, which can be gz/bz2 compressed.

Options

--threads/-t

Number of threads for MAFFT and IQ-tree, default is the number of cores or 8, whichever is lower.

--output/-o

Path to the output files (default is the database path).

--best-hits = none|all|random

In case of multiple “best” hits, return none, all best hits or a random best hit (Default: random).

--all-hits

Return all hits rather than only the best.

--competitive

Find the best mapping location in the complete pangenome
(default: find the best location for each genome).

--previous-run

The mapping_summary.txt file from a previous mapping run (random-best competitive mode) for a better estimation of coverage in a metagenomic setting.

--out-format = SAM|BAM|none

Writes the alignment files in BAM or SAM format or don’t write any output files (default: SAM).

--gap-open

Gap open penalty (range: [-50..-1], default: -20).

--gap-extension

Gap extension penalty (range: [-5..-1], default: -3).

--interleaved

Process the fastq file as an interleaved paired-end archive.

--unmapped

Check unmapped genomes.

Options that influence the mapping sensitivity

--sensitivity/-s = very-fast|fast|sensitive|very-sensitive

Four settings that automatically set the parameters controlling the sensitivity, ranging from least to most sensitive.

--min-identity

The minimum acceptable identity of the alignment
(default: 0.5, range: [0,1]).

--alignment-band

The length of bound of banded alignment
(default: 5, range: [1..100]).

--min-hit-length

The minimum acceptable length of alignment after soft-clipping
(default: 13, range: [10..100]).

--max-num-locations

The maximum number of locations of candidate hits to examine
(default: 15, range: [1..100]).

--max-alignment-length

The maximum acceptable length of alignment
(default: 2.000, range: [50..5.000]).

--max-fragment-length

The maximum acceptable length of fragment
(default: 4998, range: [50..5000]).

--num-kmer-samples

The number of kmers sampled from read
(default: 15, range: [1..r-k+1]).

--clipping-stringency

The stringency of soft-clipping (default: 1).
0 : no soft clipping
1 : low
2 : medium
3 : high

Example input files

FASTQ file

@SRR13153715.1 1/1
TGGTCATACAGCAAAGCATAATTGTCACCATTACTATGGCAATCAAGCCAGCTATAAAACCTAGCCAAATGTACCATGGCCATTTTATATACTGCTCATACTTTCCAAGTTCTTGGAGATCGAT
+
EEEEEEEEEEEEEEEAEEEE/EEEEE/AEEEEEEEEEEEEEE/EE/EEE/<EEEEEEE/EEEEEEEEEEEEEAEEEEEAEEEEEAEEAEEEEEEA<AAAEEAEEA<EE/EEEEAEAEA/EEAA/

Genome numbers file

1
2
5

Example commands

$ pantools map arabidopsis_DB genome_numbers.txt ERR031564_1.fastq
$ pantools map --include=1-5 --sensitivity=sensitive arabidopsis_DB genome_numbers.txt ERR031564_1.fastq
$ pantools map --competitive -m=all-bests arabidopsis_DB genome_numbers.txt ERR031564_1.fastq
$ pantools map --interleaved  arabidopsis_DB genome_numbers.txt interleaved_reads.fastq
$ pantools map arabidopsis_DB genome_numbers.txt ERR031564_1.fastq ERR031564_2.fastq

Output files

  • mapping_summary.txt, number of mapped and unmapped reads per genome

  • One SAM or BAM file is generated for each genome included in the analysis.