Workflows for pangenomics
=========================

Since PanTools has many subcommands, we have created a number of workflows
to help you get started.

Finding core, accessory and unique genes
----------------------------------------

One of the most common tasks for a pangenome analysis is to find the core,
accessory and unique genes in a set of genomes. For this, you needs to
calculate homology groups and then find the core, accessory and unique genes.
Homology grouping can be done using the ``group`` command if you already has
a set of parameters for the homology search. If not, the ``optimal_grouping``
command can be used to find the optimal parameters for a given set of
proteins. This core, accessory and unique analysis can be performed for both
pangenomes and panproteomes.

Pangenome analysis
^^^^^^^^^^^^^^^^^^

.. graphviz::

    digraph G {
        "build_pangenome" -> "add_annotations";
        "add_annotations" -> "group";
        "add_annotations" -> "busco_protein";
        "busco_protein" -> "optimal_grouping";
        "optimal_grouping" -> "change_grouping";
        "group" -> "gene_classification";
        "change_grouping" -> "gene_classification";
    }

Panproteome analysis
^^^^^^^^^^^^^^^^^^^^

.. graphviz::

    digraph P {
        "build_panproteome" -> "group";
        "build_panproteome" -> "busco_protein";
        "busco_protein" -> "optimal_grouping";
        "optimal_grouping" -> "change_grouping";
        "group" -> "gene_classification";
        "change_grouping" -> "gene_classification";
    }

Creating phylogenetic trees
---------------------------

PanTools has six different commands for creating phylogenetic trees. However,
some methods are specific to a pangenome since they work on nucleotide
sequences. Optionally, you can also add phenotype information to the PanTools
database and use this information to color the tree.

Pangenome analysis
^^^^^^^^^^^^^^^^^^

.. graphviz::

    digraph G {
        "build_pangenome" -> "add_annotations";
        "build_pangenome" -> "add_phenotype" [style=dashed];
        "add_annotations" -> "group";
        "add_annotations" -> "busco_protein";
        "busco_protein" -> "optimal_grouping";
        "optimal_grouping" -> "change_grouping";
        "group" -> "gene_classification";
        "change_grouping" -> "gene_classification";
        "add_phenotype" -> "gene_classification" [style=dashed];
        "gene_classification" -> "gene_distance_tree.R";
        "add_phenotype" -> "kmer_classification" [style=dashed];
        "build_pangenome" -> "kmer_classification";
        "add_phenotype" -> "ani" [style=dashed];
        "build_pangenome" -> "ani";
        "kmer_classification" -> "genome_kmer_distance_tree.R";
        "gene_classification" -> "core_phylogeny";
        "gene_classification" -> "mlsa_find_genes";
        "mlsa_find_genes" -> "mlsa_concatenate";
        "mlsa_concatenate" -> "mlsa";
        "gene_classification" -> "consensus_tree";
    }

Panproteome analysis
^^^^^^^^^^^^^^^^^^^^

.. graphviz::

    digraph P {
        "build_panproteome" -> "add_phenotype" [style=dashed];
        "build_panproteome" -> "group";
        "build_panproteome" -> "busco_protein";
        "busco_protein" -> "optimal_grouping";
        "optimal_grouping" -> "change_grouping";
        "group" -> "gene_classification";
        "change_grouping" -> "gene_classification";
        "add_phenotype" -> "gene_classification" [style=dashed];
        "gene_classification" -> "gene_distance_tree.R";
        "add_phenotype" -> "kmer_classification" [style=dashed];
        "build_panproteome" -> "kmer_classification";
        "add_phenotype" -> "ani" [style=dashed];
        "build_panproteome" -> "ani";
        "kmer_classification" -> "genome_kmer_distance_tree.R";
        "gene_classification" -> "core_phylogeny";
        "gene_classification" -> "mlsa_find_genes";
        "mlsa_find_genes" -> "mlsa_concatenate";
        "mlsa_concatenate" -> "mlsa";
        "gene_classification" -> "consensus_tree";
    }

Mapping reads
-------------

PanTools has a ``map`` subcommand for mapping WGS reads to a pangenome. This
subcommand can be used to map reads to a pangenome only.

.. graphviz::

    digraph G {
        "build_pangenome" -> "map";
    }

Adding variants
---------------

Two types of variation can be added to a pangenome: VCF files and a PAV table.
VCF files can only be used for pangenomes, while PAV tables can be used for
both pangenomes and panproteomes. Creating homology groups is not needed for
adding these two types of variation but it is needed for the downstream
analyses. Alternatively to ``group`` as used in the graphs below, you can also
use the ``busco_protein``, ``optimal_grouping`` and ``change_grouping`` chain
of commands.

Pangenome analysis (VCF)
^^^^^^^^^^^^^^^^^^^^^^^^

.. graphviz::

        digraph G {
            "build_pangenome" -> "add_annotations";
            "add_annotations" -> "add_variants";
            "add_annotations" -> "group";
            "group" -> "msa";
            "add_variants" -> "msa";
            "group" -> "core_phylogeny";
            "add_variants" -> "core_phylogeny";
            "group" -> "consensus_tree";
            "add_variants" -> "consensus_tree";
        }

Pangenome analysis (PAV)
^^^^^^^^^^^^^^^^^^^^^^^^

.. graphviz::

        digraph G {
            "build_pangenome" -> "add_annotations";
            "add_annotations" -> "add_pavs";
            "add_annotations" -> "group";
            "group" -> "gene_classification";
            "add_pavs" -> "gene_classification";
            "group" -> "pangenome_structure";
            "add_pavs" -> "pangenome_structure";
        }

Panproteome analysis (PAV)
^^^^^^^^^^^^^^^^^^^^^^^^^^

.. graphviz::

        digraph P {
            "build_panproteome" -> "group";
            "build_panproteome" -> "add_pavs";
            "group" -> "gene_classification";
            "add_pavs" -> "gene_classification";
            "group" -> "pangenome_structure";
            "add_pavs" -> "pangenome_structure";
        }