Read mapping
Map
Map single or paired-end short reads to one or multiple genomes in the pangenome. One SAM or BAM file is generated for each genome included in the analysis.
- Parameters
<databaseDirectory>
Path to the database root directory.
<shortReadFiles>
One or two short-read archives in FASTQ format, which can be gz/bz2 compressed.
- Options
--threads
/-t
Number of threads for MAFFT and IQ-tree, default is the number of cores or 8, whichever is lower.
--output
/-o
Path to the output files (default is the database path).
--include
/-i
Only include a selection of genomes.
--exclude
/-e
Exclude a selection of genomes.
--best-hits
=none|all|random
In case of multiple “best” hits, return none, all best hits or a random best hit (Default: random).
--all-hits
Return all hits rather than only the best.
--competitive
Find the best mapping location in the complete pangenome
(default: find the best location for each genome).--previous-run
The mapping_summary.txt file from a previous mapping run (random-best competitive mode) for a better estimation of coverage in a metagenomic setting.
--out-format
=SAM|BAM|none
Writes the alignment files in BAM or SAM format or don’t write any output files (default: SAM).
--gap-open
Gap open penalty (range: [-50..-1], default: -20).
--gap-extension
Gap extension penalty (range: [-5..-1], default: -3).
--interleaved
Process the fastq file as an interleaved paired-end archive.
--unmapped
Check unmapped genomes.
- Options that influence the mapping sensitivity
--sensitivity
/-s
=very-fast|fast|sensitive|very-sensitive
Four settings that automatically set the parameters controlling the sensitivity, ranging from least to most sensitive.
--min-identity
The minimum acceptable identity of the alignment
(default: 0.5, range: [0,1]).--alignment-band
The length of bound of banded alignment
(default: 5, range: [1..100]).--min-hit-length
The minimum acceptable length of alignment after soft-clipping
(default: 13, range: [10..100]).--max-num-locations
The maximum number of locations of candidate hits to examine
(default: 15, range: [1..100]).--max-alignment-length
The maximum acceptable length of alignment
(default: 2.000, range: [50..5.000]).--max-fragment-length
The maximum acceptable length of fragment
(default: 4998, range: [50..5000]).--num-kmer-samples
The number of kmers sampled from read
(default: 15, range: [1..r-k+1]).--clipping-stringency
The stringency of soft-clipping (default: 1).
0 : no soft clipping
1 : low
2 : medium
3 : high- Example input files
FASTQ file
@SRR13153715.1 1/1 TGGTCATACAGCAAAGCATAATTGTCACCATTACTATGGCAATCAAGCCAGCTATAAAACCTAGCCAAATGTACCATGGCCATTTTATATACTGCTCATACTTTCCAAGTTCTTGGAGATCGAT + EEEEEEEEEEEEEEEAEEEE/EEEEE/AEEEEEEEEEEEEEE/EE/EEE/<EEEEEEE/EEEEEEEEEEEEEAEEEEEAEEEEEAEEAEEEEEEA<AAAEEAEEA<EE/EEEEAEAEA/EEAA/
- Example commands
$ pantools map arabidopsis_DB ERR031564_1.fastq $ pantools map --include=1-5 --sensitivity=sensitive arabidopsis_DB ERR031564_1.fastq $ pantools map --competitive -m=all-bests arabidopsis_DB ERR031564_1.fastq $ pantools map --interleaved arabidopsis_DB interleaved_reads.fastq $ pantools map arabidopsis_DB ERR031564_1.fastq ERR031564_2.fastq
- Output
mapping_summary.txt, number of mapped and unmapped reads per genome
One SAM or BAM file is generated for each genome included in the analysis.