Synonymous and non-synonymous substitutions

Calculate dN/dS

Warning

This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it.

This function allows you to calculate synonymous (dS) and non-synonymous (dN) substitutions between homologous and syntenic genes. Nonsynonymous mutations result in protein changes and frequently under selection. Synonymous substitutions do not lead to amino acid changes and are (in general), not under selection. The dN/dS ratio is a metric indicating the evolutionary rate of gene sequences and is commonly used as indicator of selection pressure.

Method:
Three steps are required to calculate dN/dS ratios:
- Alignment of protein sequences (pairwise and multiple sequence alignments).
- Convert protein alignment into codon alignment with PAL2NAL.
- Use codeml to calculate dS, dN and dN/dS from the codon alignment.
Pairwise and multiple sequence alignment
Whenever the genome selection is only one or two genomes, sequences can be aligned pairwise. Otherwise, multiple sequence alignments are created for (a selection) homology groups. Both the pairwise and MSA output files are recognized by this function and are skipped in the next run.

Homology groups are aligned. Synteny block information in the pangenome is used to identify syntenic gene pair to only extract these values from the output.

How to interpret dN/dS ratios
dN/dS = 1: Neutral selection.
dN/dS > 1: Positive (Darwinian) selection. More amino acid changing mutations compared to non-changing.
dN/dS < 1: Negative (purifying) selection. Less amino acid changing mutations compared to non-changing. The protein is under constraint.
Required software
Parameters

<databaseDirectory>

Path to the database root directory.

Options

--include/-i

Only include a selection of genomes.

--exclude/-e

Exclude a selection of genomes

-—selection-file

Text file with rules to use a specific set of genomes and sequences.

--threads/-t

Number of parallel threads.

--syntelogs

Calculate substitutions between syntenic genes (default: all homologous genes).

--phasing

Analyze phased genomes.

--scoring-matrix

The scoring matrix to use (default: BLOSUM62).

--homology-file/-H

Text file with homology_group node identifiers.

--allow-zeros

Allow zeros for dN or dS values.

Example commands
$ pantools calculate_dn_ds tomato_DB
$ pantools calculate_dn_ds tomato_DB --include 1,2
$ Rscript tomato_DB/dnds/rscript/1_2/plot_dnds.R
Output

Output scripts are available in dn_ds/scripts:

  • homologs_plot_dnds.R and homologs_plot_log10_dnds.R, R scripts to visualize dN/dS substitutions.