BLAST

Warning

This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it.

BLAST your sequence against the pangenome database. If the region is present, the gene node identifier and relevant information is returned. If no BLAST program is provided via –mode, BLASTN or BLASTP is used depending on the input sequences.

BLAST is done in a traditional way. A BLAST database is created from all genomes in the pangenome and provided input sequences are searched in here. This database is created the first time the blast functionality is initiated. If a new genome is added to the pangenome, the BLAST database is automatically updated. Once BLAST is done, the pangenome is searched using coordinates from BLAST hit. output file we search the pangenome and report if any gene in the region is found.

Required software

BLAST suite (BLASTN, BLASTP, BLASTX, TBLASTN, makeblastdb)

Parameters

<databaseDirectory>

Path to the database root directory.

<fastaFile>

a (multi) FASTA file with nucleotide of protein sequences.

Options

--include/-i

Only include a selection of genomes.

--exclude/-e

Exclude a selection of genomes.

-—selection-file

Text file with rules to use a specific set of genomes and sequences.

--mode

BLAST mode (BLASTN, BLASTP, BLASTX, TBLASTX or TBLASTN).

--minimum-identity

Minimum required sequence identity. Range 1-100.

--alignment-threshold

minimum required alignment length as compared to the input query. Range 1-100.

--rebuild

Rebuild the BLAST databases.

Example commands
$ pantools blast tomato_DB sequences.fasta
$ pantools blast tomato_DB sequences.fasta --mode BLASTP -t 10 --minimum-identity 55 --alignment-threshold 55
Example input file
>sequence1
FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
>sequence2
KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMR
LMELKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
Output

Output files are written to the BLAST directory.

  • blast_results.tsv, The output file is a regular BLAST output (format 6) of 12 columns with additional columns of information of the region in Pantools.