Sequence visualization ^^^^^^^^^^^^^^^^^^^^^^ .. warning:: This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it. Generate a visualization of multiple sequences with annotation bars. **Annotation bar types** 1. **Gene and repeat coverage.** Percentage of repeat and gene coverage calculated within a sliding window on the sequence. A coverage of 100% means every nucleotide of the window is covered. 2. **Found in other chromosome.** The gene region is only visible if the gene is also found on another chromosome. 3. **Gene classification.** Genes are coloured according their category: core, accessory or unique. 4. **Haplotype presence.** Genes are coloured by the number of haplotypes (phases) they are found in. 5. **Synteny.** Syntenic blocks are drawn between two sequences. .. _sequence visualization numbers: .. figure:: ../figures/seq_vis_two_haplotypes_numbers.png :width: 600 :align: center Annotation plot for two sequences with all possible annotation bars. | **Input requirements** | Each bar type has specific requirements before it can be included. Input requirements are checked upon execution, bar types are excluded if they fail to meet the conditions. | 1. Gene and repeat coverage are calculated by this function. To allow repeat coverage, repeat annotations must be added via :ref:`add_repeats `. The window size for which the coverage is calculated is controlled with ``--window-size``. | 2. Found in another chromosome. Requires phasing information to be added via :ref:`add_phasing `. | 3. Gene classification uses the output from the previous :ref:`gene_classification ` run. | 4. Haplotype presence uses :ref:`gene_classification ` with the ``--phasing`` argument. | 5. Synteny information must be incorporated in the datababse. Use :ref:`calculate_synteny ` followed by :ref:`add_synteny `. Do not forget to include the ``--sequence`` argument for the synteny calculation as the default is only between genomes. | **Sequence selection** | The visualization can be created with or without synteny information. The plots including synteny are limited up to eight sequences whereas the other can visualize all sequences of a single genome. | **Up to eight sequences** | When phasing information was included through :ref:`add_phasing `, sequences belonging to the same chromosome (number) are combined in a plot. The sequences are ordered in alphabetical order. To customly order sequences, use option 2. | Without phasing information a sequence selection is required. | **Whole genome visualization** | A visualization of all sequences for a genome. Available bar types are gene classification and haplotoype presence (:numref:`sequence visualization genome CAU`, :numref:`sequence visualization genome haplotypes`). | The Rscript to generate these visualization are only generated with phasing information (:ref:`add_phasing `) in the pangenome. **Parameters** .. list-table:: :widths: 30 70 * - - Path to the database root directory. **Options** .. list-table:: :widths: 30 70 * - ``--include``/``-i`` - Only include a selection of genomes. This automatically lowers the threshold for core genes. * - ``--exclude``/``-e`` - Exclude a selection of genomes. This automatically lowers the threshold for core genes. * - ``-—selection-file`` - Text file with rules to use a specific set of genomes and sequences. This automatically lowers the threshold for core genes. * - ``--rules`` - Text file with set of rules to determine which bar types and wha sequences (in which specific order) should be visualized. **Example commands** .. code:: bash $ pantools sequence_visualization tomato_DB $ pantools sequence_visualization tomato_DB --rules rules.txt **Example input** Rules set by the ``--rules`` file determine which bar types, in which order, and for which sequences they should be visualized. Include all possible visualization and include all sequences (with phasing information). .. code:: text gene_classification haplotype_presence other_chromosomes repeat_coverage gene_coverage Visualize the haplotype counts for all sequences with phasing information. .. code:: text haplotype_presence Create one plot that visualizes the haplotype counts for the four sequences (with phasing information). .. code:: text sequence 1_2,1_3,1_1,1_2 haplotype_presence **Output** Output files are written to the **sequence_visualization** directory in the database. * **plot_sequences.R**, Rscript to visualize the annotation bars. This script is created when a sequence selection was made using the **sequence** rule. * **run_visualization_scripts.sh**, shell script to execute all Rscripts. This script is only generated when there is phasing information (:ref:`add_phasing `) and no sequence selection was made. .. _sequence visualization all: .. figure:: ../figures/seq_vis_two_haplotypes.png :width: 600 :align: center Sequence plot for two sequences with all possible annotation bars. | .. _sequence visualization haplotypes: .. figure:: ../figures/seq_vis_four_haplotypes.png :width: 600 :align: center Sequence plot of four sequences with haplotype copies bars. | .. _sequence visualization genome CAU: .. figure:: ../figures/seq_vis_genome_CAU.png :width: 600 :align: center Genome plot with core, accessory and unique bars. Diploid apple genome. | .. _sequence visualization genome haplotypes: .. figure:: ../figures/seq_vis_genome_haplotype.png :width: 600 :align: center Genome plot with haplotype copies bars. Tetraploid potato genome.