Read mapping

Map

Map single or paired-end short reads to one or multiple genomes in the pangenome. One SAM or BAM file is generated for each genome included in the analysis.

Required arguments

--database_path/-dp Path to the pangenome database.
-1 The first short-read archive in FASTQ format, which can be gz/bz2 compressed. This file can be precessed interleaved by -il option.
--genome-numbers/-gn A text file containing genome numbers to map reads against in each line.

Optional arguments

-2 The second short-read archive in FASTQ format, which can be gz/bz2 compressed.
--out-format/-of SAM BAM none Writes the alignment files in BAM or SAM format or don’t write any output files.
--output-path/-op (default value: Database path determined by -dp) : Path to the output files.
--threads/-tn (default value: 1) : The number of parallel working threads.
--interleaved/-il Process the fastq file as an interleaved paired-end archive.
--raw-abundance-file/-raf The mapping_summary.txt file from a previous mapping run (random-best competitive mode) for a better estimation of coverage in a metagenomic setting.
--alignment-mode or -am The alignment mode:
-1 : Competitive, none-bests
-2 : Competitive, random-best
-3 : Competitive, all-bests
1 : Normal, none-bests
2 : Normal, random-best (default)
3 : Normal, all-bests
0 : Normal, all-hits

Optional arguments that influence the mapping sensitivity

--very-fast/--fast/--sensitive/--very-sensitive Four settings that automatically set the parameters controlling the sensitivity, ranging from least to most sensitive.
--min-mapping-identity*/-mmi (default value: 0.5, valid range: [0..1)) : The minimum acceptable identity of the alignment.
--num-kmer-samples/-nks (default value: 15, valid range: [1..r-k+1]) : The number of kmers sampled from read.
--min-hit-length/-mhl (default value: 13, valid range: [10..100]) : The minimum acceptable length of alignment after soft-clipping.
--max-alignment-length/-mal (default value: 1000, valid range: [50..5000]) : The maximum acceptable length of alignment.
--max-fragment-length/-mfl (default value: 2000, valid range: [50..5000]) : The maximum acceptable length of fragment.
--max-num-locations/-mnl (default value: 15, valid range: [1..100]) : The maximum number of location of candidate hits to examine.
--alignment-band/-ab (default value: 5, valid range: [1..100]) : The length of bound of banded alignment.
--clipping-stringency/-ci (default value: 1) : The stringency of soft-clipping.
0 : no soft clipping
1 : low
2 : medium
3 : high

Example input files

FASTQ file

@SRR13153715.1 1/1
TGGTCATACAGCAAAGCATAATTGTCACCATTACTATGGCAATCAAGCCAGCTATAAAACCTAGCCAAATGTACCATGGCCATTTTATATACTGCTCATACTTTCCAAGTTCTTGGAGATCGAT
+
EEEEEEEEEEEEEEEAEEEE/EEEEE/AEEEEEEEEEEEEEE/EE/EEE/<EEEEEEE/EEEEEEEEEEEEEAEEEEEAEEEEEAEEAEEEEEEA<AAAEEAEEA<EE/EEEEAEAEA/EEAA/

Genome numbers file

1
2
5

Example commands

$ pantools map -dp arabidopsis_DB -1 ERR031564_1.fastq --reference 1-5
$ pantools map -dp arabidopsis_DB -1 ERR031564_1.fastq -gn genome_numbers.txt
$ pantools map -dp arabidopsis_DB -1 interleaved_reads.fastq --interleaved -gn genome_numbers.txt
$ pantools map -dp arabidopsis_DB -1 ERR031564_1.fastq -2 ERR031564_2.fastq -gn genome_numbers.txt

Output files

  • mapping_summary.txt, number of mapped and unmapped reads per genome

  • One SAM or BAM file is generated for each genome included in the analysis.