Read mapping

Map

Map single or paired-end short reads to one or multiple genomes in the pangenome. One SAM or BAM file is generated for each genome included in the analysis.

Parameters

<databaseDirectory>	Path to the database root directory.
<genomeNumbers>	A text file containing genome numbers to map reads against in each line.
<shortReadFiles>	One or two short-read archives in FASTQ format, which can be gz/bz2 compressed.

Options

`--threads`/`-t`	Number of threads for MAFFT and IQ-tree, default is the number of cores or 8, whichever is lower.
`--output`/`-o`	Path to the output files (default is the database path).
`--best-hits` = `none\|all\|random`	In case of multiple “best” hits, return none, all best hits or a random best hit (Default: random).
`--all-hits`	Return all hits rather than only the best.
`--competitive`	Find the best mapping location in the complete pangenome (default: find the best location for each genome).
`--previous-run`	The mapping_summary.txt file from a previous mapping run (random-best competitive mode) for a better estimation of coverage in a metagenomic setting.
`--out-format` = `SAM\|BAM\|none`	Writes the alignment files in BAM or SAM format or don’t write any output files (default: SAM).
`--gap-open`	Gap open penalty (range: [-50..-1], default: -20).
`--gap-extension`	Gap extension penalty (range: [-5..-1], default: -3).
`--interleaved`	Process the fastq file as an interleaved paired-end archive.
`--unmapped`	Check unmapped genomes.

Options that influence the mapping sensitivity

`--sensitivity`/`-s` = `very-fast\|fast\|sensitive\|very-sensitive`	Four settings that automatically set the parameters controlling the sensitivity, ranging from least to most sensitive.
`--min-identity`	The minimum acceptable identity of the alignment (default: 0.5, range: [0,1]).
`--alignment-band`	The length of bound of banded alignment (default: 5, range: [1..100]).
`--min-hit-length`	The minimum acceptable length of alignment after soft-clipping (default: 13, range: [10..100]).
`--max-num-locations`	The maximum number of locations of candidate hits to examine (default: 15, range: [1..100]).
`--max-alignment-length`	The maximum acceptable length of alignment (default: 2.000, range: [50..5.000]).
`--max-fragment-length`	The maximum acceptable length of fragment (default: 4998, range: [50..5000]).
`--num-kmer-samples`	The number of kmers sampled from read (default: 15, range: [1..r-k+1]).
`--clipping-stringency`	The stringency of soft-clipping (default: 1). 0 : no soft clipping 1 : low 2 : medium 3 : high

Example input files

FASTQ file

@SRR13153715.1 1/1
TGGTCATACAGCAAAGCATAATTGTCACCATTACTATGGCAATCAAGCCAGCTATAAAACCTAGCCAAATGTACCATGGCCATTTTATATACTGCTCATACTTTCCAAGTTCTTGGAGATCGAT
+
EEEEEEEEEEEEEEEAEEEE/EEEEE/AEEEEEEEEEEEEEE/EE/EEE/<EEEEEEE/EEEEEEEEEEEEEAEEEEEAEEEEEAEEAEEEEEEA<AAAEEAEEA<EE/EEEEAEAEA/EEAA/

Genome numbers file

1
2
5

Example commands

$ pantools map arabidopsis_DB genome_numbers.txt ERR031564_1.fastq
$ pantools map --include=1-5 --sensitivity=sensitive arabidopsis_DB genome_numbers.txt ERR031564_1.fastq
$ pantools map --competitive -m=all-bests arabidopsis_DB genome_numbers.txt ERR031564_1.fastq
$ pantools map --interleaved  arabidopsis_DB genome_numbers.txt interleaved_reads.fastq
$ pantools map arabidopsis_DB genome_numbers.txt ERR031564_1.fastq ERR031564_2.fastq

Output files

mapping_summary.txt, number of mapped and unmapped reads per genome
One SAM or BAM file is generated for each genome included in the analysis.