Sequence visualization
^^^^^^^^^^^^^^^^^^^^^^

.. warning::
 This is a novel function and has not yet undergone testing by external users.
 Please report any bugs or issues to the PanTools team so we can improve it.

Generate a visualization of multiple sequences with annotation bars.

**Annotation bar types**

1. **Gene and repeat coverage.** Percentage of repeat and gene coverage
   calculated within a sliding window on the sequence. A coverage of 100%
   means every nucleotide of the window is covered.
2. **Found in other chromosome.** The gene region is only visible if the gene
   is also found on another chromosome.
3. **Gene classification.** Genes are coloured according their category: core,
   accessory or unique.
4. **Haplotype presence.** Genes are coloured by the number of haplotypes
   (phases) they are found in.
5. **Synteny.** Syntenic blocks are drawn between two sequences.

.. _sequence visualization numbers:

.. figure:: ../figures/seq_vis_two_haplotypes_numbers.png
   :width: 600
   :align: center

   Annotation plot for two sequences with all possible annotation bars.

| **Input requirements**
| Each bar type has specific requirements before it can be included. Input
  requirements are checked upon execution, bar types are excluded if they fail
  to meet the conditions.

| 1. Gene and repeat coverage are calculated by this function. To allow repeat
     coverage, repeat annotations must be added via :ref:`add_repeats
     <construction/annotate:Add repeats>`. The window size for which
     the coverage is calculated is controlled with ``--window-size``.
| 2. Found in another chromosome. Requires phasing information to be added via
     :ref:`add_phasing <construction/annotate:add phasing>`.
| 3. Gene classification uses the output from the previous
     :ref:`gene_classification <analysis/classification:gene classification>`
     run.
| 4. Haplotype presence uses :ref:`gene_classification
     <analysis/classification:gene classification>` with the ``--phasing``
     argument.
| 5. Synteny information must be incorporated in the datababse.
     Use :ref:`calculate_synteny <construction/synteny:Calculate synteny>`
     followed by :ref:`add_synteny <construction/synteny:Add synteny>`. Do not
     forget to include the ``--sequence`` argument for the synteny calculation
     as the default is only between genomes.

| **Sequence selection**
| The visualization can be created with or without synteny information.
  The plots including synteny are limited up to eight sequences whereas the
  other can visualize all sequences of a single genome.

| **Up to eight sequences**
| When phasing information was included through
  :ref:`add_phasing <construction/annotate:add phasing>`, sequences
  belonging to the same chromosome (number) are combined in a plot. The
  sequences are ordered in alphabetical order. To customly order sequences,
  use option 2.
| Without phasing information a sequence selection is required.

| **Whole genome visualization**
| A visualization of all sequences for a genome. Available bar types are gene
  classification and haplotoype presence (:numref:`sequence visualization genome
  CAU`, :numref:`sequence visualization genome haplotypes`).
| The Rscript to
  generate these visualization are only generated with phasing information
  (:ref:`add_phasing <construction/annotate:add phasing>`) in the
  pangenome.

**Parameters**
  .. list-table::
     :widths: 30 70

     * - <databaseDirectory>
       - Path to the database root directory.

**Options**
  .. list-table::
     :widths: 30 70

     * - ``--include``/``-i``
       - Only include a selection of genomes. This automatically lowers the
         threshold for core genes.
     * - ``--exclude``/``-e``
       - Exclude a selection of genomes. This automatically lowers the threshold
         for core genes.
     * - ``-—selection-file``
       - Text file with rules to use a specific set of genomes and sequences.
         This automatically lowers the threshold for core genes.
     * - ``--rules``
       - Text file with set of rules to determine which bar types and wha
         sequences (in which specific order) should be visualized.

**Example commands**
  .. code:: bash

    $ pantools sequence_visualization tomato_DB
    $ pantools sequence_visualization tomato_DB --rules rules.txt

**Example input**
  Rules set by the ``--rules`` file determine which bar types, in which order,
  and for which sequences they should be visualized.

  Include all possible visualization and include all sequences (with phasing
  information).

  .. code:: text

      gene_classification
      haplotype_presence
      other_chromosomes
      repeat_coverage
      gene_coverage

  Visualize the haplotype counts for all sequences with phasing information.

  .. code:: text

      haplotype_presence

  Create one plot that visualizes the haplotype counts for the four sequences
  (with phasing information).

  .. code:: text

      sequence 1_2,1_3,1_1,1_2
      haplotype_presence

**Output**
  Output files are written to the **sequence_visualization** directory in the
  database.

  * **plot_sequences.R**, Rscript to visualize the annotation bars. This script
    is created when a sequence selection was made using the **sequence** rule.
  * **run_visualization_scripts.sh**, shell script to execute all Rscripts.
    This script is only generated when there is phasing information
    (:ref:`add_phasing <construction/annotate:add phasing>`) and no
    sequence selection was made.

  .. _sequence visualization all:

  .. figure:: ../figures/seq_vis_two_haplotypes.png
     :width: 600
     :align: center

     Sequence plot for two sequences with all possible annotation bars.

  |

  .. _sequence visualization haplotypes:

  .. figure:: ../figures/seq_vis_four_haplotypes.png
     :width: 600
     :align: center

     Sequence plot of four sequences with haplotype copies bars.

  |

  .. _sequence visualization genome CAU:

  .. figure:: ../figures/seq_vis_genome_CAU.png
     :width: 600
     :align: center

     Genome plot with core, accessory and unique bars. Diploid apple genome.

  |

  .. _sequence visualization genome haplotypes:

  .. figure:: ../figures/seq_vis_genome_haplotype.png
     :width: 600
     :align: center

     Genome plot with haplotype copies bars. Tetraploid potato genome.