Sequence visualization

Warning

This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it.

Generate a visualization of multiple sequences with annotation bars.

Annotation bar types

Gene and repeat coverage. Percentage of repeat and gene coverage calculated within a sliding window on the sequence. A coverage of 100% means every nucleotide of the window is covered.
Found in other chromosome. The gene region is only visible if the gene is also found on another chromosome.
Gene classification. Genes are coloured according their category: core, accessory or unique.
Haplotype presence. Genes are coloured by the number of haplotypes (phases) they are found in.
Synteny. Syntenic blocks are drawn between two sequences.

../_images/seq_vis_two_haplotypes_numbers.png — Fig. 18 Annotation plot for two sequences with all possible annotation bars.

Input requirements

Each bar type has specific requirements before it can be included. Input requirements are checked upon execution, bar types are excluded if they fail to meet the conditions.

1. Gene and repeat coverage are calculated by this function. To allow repeat
coverage, repeat annotations must be added via add_repeats. The window size for which
the coverage is calculated is controlled with --window-size.
2. Found in another chromosome. Requires phasing information to be added via
add_phasing.
3. Gene classification uses the output from the previous
gene_classification
run.
4. Haplotype presence uses gene_classification with the --phasing
argument.
5. Synteny information must be incorporated in the datababse.
Use calculate_synteny
followed by add_synteny. Do not
forget to include the --sequence argument for the synteny calculation
as the default is only between genomes.

Sequence selection

The visualization can be created with or without synteny information. The plots including synteny are limited up to eight sequences whereas the other can visualize all sequences of a single genome.

Up to eight sequences
When phasing information was included through
add_phasing, sequences
belonging to the same chromosome (number) are combined in a plot. The
sequences are ordered in alphabetical order. To customly order sequences,
use option 2.
Without phasing information a sequence selection is required.

Whole genome visualization
A visualization of all sequences for a genome. Available bar types are gene
classification and haplotoype presence (Fig. 21, Fig. 22).
The Rscript to
generate these visualization are only generated with phasing information
(add_phasing) in the
pangenome.

Parameters

Path to the database root directory.

Options

`--include`/`-i`	Only include a selection of genomes. This automatically lowers the threshold for core genes.
`--exclude`/`-e`	Exclude a selection of genomes. This automatically lowers the threshold for core genes.
`-—selection-file`	Text file with rules to use a specific set of genomes and sequences. This automatically lowers the threshold for core genes.
`--rules`	Text file with set of rules to determine which bar types and wha sequences (in which specific order) should be visualized.

Example commands

$ pantools sequence_visualization tomato_DB
$ pantools sequence_visualization tomato_DB --rules rules.txt

Example input

Rules set by the --rules file determine which bar types, in which order, and for which sequences they should be visualized.

Include all possible visualization and include all sequences (with phasing information).

gene_classification
haplotype_presence
other_chromosomes
repeat_coverage
gene_coverage

Visualize the haplotype counts for all sequences with phasing information.

haplotype_presence

Create one plot that visualizes the haplotype counts for the four sequences (with phasing information).

sequence 1_2,1_3,1_1,1_2
haplotype_presence

Output

Output files are written to the sequence_visualization directory in the database.

plot_sequences.R, Rscript to visualize the annotation bars. This script is created when a sequence selection was made using the sequence rule.
run_visualization_scripts.sh, shell script to execute all Rscripts. This script is only generated when there is phasing information (add_phasing) and no sequence selection was made.

../_images/seq_vis_two_haplotypes.png — Fig. 19 Sequence plot for two sequences with all possible annotation bars.

../_images/seq_vis_four_haplotypes.png — Fig. 20 Sequence plot of four sequences with haplotype copies bars.

../_images/seq_vis_genome_CAU.png — Fig. 21 Genome plot with core, accessory and unique bars. Diploid apple genome.

../_images/seq_vis_genome_haplotype.png — Fig. 22 Genome plot with haplotype copies bars. Tetraploid potato genome.