Workflows for pangenomics ================================= Since PanTools has many subcommands, we have created a number of workflows to help you get started. Finding core, accessory and unique genes ---------------------------------------- One of the most common tasks for a pangenome analysis is to find the core, accessory and unique genes in a set of genomes. For this, one needs to calculate homology groups and then find the core, accessory and unique genes. Homology grouping can be done using the ``group`` command if one already has a set of parameters for the homology search. If not, the ``optimal_grouping`` command can be used to find the optimal parameters for a given set of proteins. This core, accessory and unique analysis can be performed for both pangenomes and panproteomes. Pangenome analysis ^^^^^^^^^^^^^^^^^^ .. graphviz:: digraph G { "build_pangenome" -> "add_annotations"; "add_annotations" -> "group"; "add_annotations" -> "busco_protein"; "busco_protein" -> "optimal_grouping"; "optimal_grouping" -> "change_grouping"; "group" -> "gene_classification"; "change_grouping" -> "gene_classification"; } Panproteome analysis ^^^^^^^^^^^^^^^^^^^^ .. graphviz:: digraph P { "build_panproteome" -> "group"; "build_panproteome" -> "busco_protein"; "busco_protein" -> "optimal_grouping"; "optimal_grouping" -> "change_grouping"; "group" -> "gene_classification"; "change_grouping" -> "gene_classification"; } Creating phylogenetic trees --------------------------- PanTools has six different commands for creating phylogenetic trees. However, some methods are specific to a pangenome since they work on nucleotide sequences. Optionally, one can also add phenotype information to the PanTools database and use this information to color the tree. Pangenome analysis ^^^^^^^^^^^^^^^^^^ .. graphviz:: digraph G { "build_pangenome" -> "add_annotations"; "build_pangenome" -> "add_phenotype" [style=dashed]; "add_annotations" -> "group"; "add_annotations" -> "busco_protein"; "busco_protein" -> "optimal_grouping"; "optimal_grouping" -> "change_grouping"; "group" -> "gene_classification"; "change_grouping" -> "gene_classification"; "add_phenotype" -> "gene_classification" [style=dashed]; "gene_classification" -> "gene_distance_tree.R"; "add_phenotype" -> "kmer_classification" [style=dashed]; "build_pangenome" -> "kmer_classification"; "add_phenotype" -> "ani" [style=dashed]; "build_pangenome" -> "ani"; "kmer_classification" -> "genome_kmer_distance_tree.R"; "gene_classification" -> "core_phylogeny"; "gene_classification" -> "mlsa_find_genes"; "mlsa_find_genes" -> "mlsa_concatenate"; "mlsa_concatenate" -> "mlsa"; "gene_classification" -> "consensus_tree"; } Panproteome analysis ^^^^^^^^^^^^^^^^^^^^ .. graphviz:: digraph P { "build_panproteome" -> "add_phenotype" [style=dashed]; "build_panproteome" -> "group"; "build_panproteome" -> "busco_protein"; "busco_protein" -> "optimal_grouping"; "optimal_grouping" -> "change_grouping"; "group" -> "gene_classification"; "change_grouping" -> "gene_classification"; "add_phenotype" -> "gene_classification" [style=dashed]; "gene_classification" -> "gene_distance_tree.R"; "add_phenotype" -> "kmer_classification" [style=dashed]; "build_panproteome" -> "kmer_classification"; "add_phenotype" -> "ani" [style=dashed]; "build_panproteome" -> "ani"; "kmer_classification" -> "genome_kmer_distance_tree.R"; "gene_classification" -> "core_phylogeny"; "gene_classification" -> "mlsa_find_genes"; "mlsa_find_genes" -> "mlsa_concatenate"; "mlsa_concatenate" -> "mlsa"; "gene_classification" -> "consensus_tree"; } Mapping reads ---------------------------- PanTools has a ``map`` subcommand for mapping WGS reads to a pangenome. This subcommand can be used to map reads to a pangenome only. .. graphviz:: digraph G { "build_pangenome" -> "map"; }