Metrics

Generates relevant metrics of the pangenome and the individual genomes and sequences.

  • On the pangenome level: the number of genomes, sequences, annotations, genes, proteins, homology groups, k-mers, and database nodes and edges.

  • On the genome and sequence level: assembly statistics and metrics about functional elements. The assembly statistics consists of genome size, N25-N95, L25-L95, BUSCO scores and GC content. An overview of the functional elements is created by summarizing the functional annotations per genome (and sequence) and reporting the shortest, longest, average length and density per MB for genome features such as genes, exons and CDS.

Parameters

<databaseDirectory>

Path to the database root directory.

Options

--include/-i

Only include a selection of genomes.

--exclude/-e

Exclude a selection of genomes.

--annotations-file/-A

A text file with the identifiers of annotations that should be used. The most recent annotation is selected for genomes without an identifier.

Example commands
$ pantools metrics tomato_DB
$ pantools metrics --exclude=1,2,5 tomato_DB
Output

Output files are written to the metrics directory in the database. Note: the percentage a genome or sequence is covered by a genes, repeats etc., (currently) does not consider overlap between features!

  • metrics.txt, overview of the metrics calculated on the pangenome and genome level.

  • metrics_per_genome.csv, summary of the metrics that are calculated on a genome level. The output is formatted as table.

  • metrics_per_sequence.csv, summary of metrics that are calculated on a sequence (contig/scaffold) level. The output is formatted as table. This file is not created when using a panproteome.