Synteny
Calculate synteny
Warning
This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it.
Estimate synteny between sequences of the pangenome using MCScanX. PanTools generates MCScanX’s required input GFF and .homology file for every pairwise comparison. The GFF holds the gene identififers and genomic coordinates. The .homology file is a two-column tab separated file that states which genes are homologous to another. Two genes are considered homologous when part of the same homology group together with a protein similarity greater than the threshold set during group.
--run
argument is included. Pairwise comparisons are divided over the number of
--threads
provided by the user. Please consider the number of sequences
in your analysis and create a subset by using --selection-file
. The
output of each comparison is written to a separate folder. Once the threads
are finished, every output (.collinearity) file is collected and
combined into a single file.--genome
or 676 sequences
using --sequence
. PanTools will try to give each genome a unique first
letter but this is only possible with 26 or less genomes and 26 sequences
per genome.- Required software
MCScanX must be manually installed and set to your $PATH, since it is unavailable on conda.
- Parameters
<databaseDirectory>
Path to the database root directory.
- Options
--include
/-i
Only include a selection of genomes. This automatically lowers the threshold for core genes.
--exclude
/-e
Exclude a selection of genomes. This automatically lowers the threshold for core genes.
-—selection-file
Text file with rules to use a specific set of genomes and sequences. This automatically lowers the threshold for core genes.
--run
Run MCScanX_h (default: false)
--threads
/-t
Number of parallel threads to be used.
--sequence
Calculate synteny between sequences (from the same genome) instead of genomes.
- Example commands
$ pantools calculate_synteny tomato_DB $ pantools calculate_synteny tomato_DB --sequence $ pantools calculate_synteny tomato_DB --sequence --run -t=24
- Output
Output files are written to the synteny directory in the database.
mcscanx.gff, the genomic coordinates of all genes included in the analysis.
mcscanx.homology, all pairwise homology relationships in the analysis.
synteny_identifiers.csv, table with the original and synteny identifiers.
When
--run
is included:mcscanx.collinearity, main output file of MCScanX_h. Contains synteny blocks that consist of pairwise collinear gene pairs. Usable by Synvisio and Accusyn in combination with mcscanx.gff. Can be included to the pangenome with add_synteny.
A .gff, .homology, .collinearity file for every sequence combination. Files are written to a folder named after the combination of two sequence identifiers. This folder also holds two .html files that visualizes collinear blocks and duplication depth between the two sequences.
- Relevant literature
Add synteny
Warning
This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it.
Include synteny information into the pangenome.
- Parameters
<databaseDirectory>
Path to the database root directory.
<collinearityFile>
A MCScanX .collinearity file.
- Example commands
$ pantools add_synteny tomato_DB tomato_DB/synteny/mcscanx.collinearity
- Example input
Required input is a .collinearity file generated by MCScanX. The example below shows the first three syntenic blocks calculated within an apple genome (sequence 2_3 to 2_28).
############### Parameters ############### # MATCH_SCORE: 50 # MATCH_SIZE: 5 # GAP_PENALTY: -1 # OVERLAP_WINDOW: 5 # E_VALUE: 1e-05 # MAX GAPS: 25 ############### Statistics ############### # Number of collinear genes: 217123, Percentage: 80.16 # Number of all genes: 270847 ########################################## ## Alignment 0: score=251.0 e_value=1.1e-10 N=6 bk1&cj1 plus 0- 0: 2_3#Mdg_11A016020_mRNA1 2_28#Mdg_06B006370_mRNA1 0 0- 1: 2_3#Mdg_11A016050_mRNA1 2_28#Mdg_06B006500_mRNA1 0 0- 2: 2_3#Mdg_11A016230_mRNA1 2_28#Mdg_06B006510_mRNA1 0 0- 3: 2_3#Mdg_11A016330_mRNA1 2_28#Mdg_06B006570_mRNA1 0 0- 4: 2_3#Mdg_11A016390_mRNA1 2_28#Mdg_06B006590_mRNA1 0 0- 5: 2_3#Mdg_11A016400_mRNA1 2_28#Mdg_06B006660_mRNA1 0 ## Alignment 1: score=258.0 e_value=4.1e-11 N=6 bk1&cj1 minus 1- 0: 2_3#Mdg_11A015930_mRNA1 2_28#Mdg_06B006850_mRNA1 0 1- 1: 2_3#Mdg_11A016020_mRNA1 2_28#Mdg_06B006770_mRNA1 0 1- 2: 2_3#Mdg_11A016050_mRNA1 2_28#Mdg_06B006760_mRNA1 0 1- 3: 2_3#Mdg_11A016230_mRNA1 2_28#Mdg_06B006570_mRNA1 0 1- 4: 2_3#Mdg_11A016330_mRNA1 2_28#Mdg_06B006510_mRNA1 0 1- 5: 2_3#Mdg_11A016390_mRNA1 2_28#Mdg_06B006500_mRNA1 0 ## Alignment 2: score=252.0 e_value=3.1e-12 N=6 bk1&cj1 minus 2- 0: 2_3#Mdg_11A015930_mRNA1 2_28#Mdg_06B006880_mRNA1 0 2- 1: 2_3#Mdg_11A016020_mRNA1 2_28#Mdg_06B006870_mRNA1 0 2- 2: 2_3#Mdg_11A016050_mRNA1 2_28#Mdg_06B006860_mRNA1 0 2- 3: 2_3#Mdg_11A016230_mRNA1 2_28#Mdg_06B006770_mRNA1 0 2- 4: 2_3#Mdg_11A016390_mRNA1 2_28#Mdg_06B006660_mRNA1 0 2- 5: 2_3#Mdg_11A016400_mRNA1 2_28#Mdg_06B006590_mRNA1 0
Synteny Overview
Warning
This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it.
- Parameters
<databaseDirectory>
Path to the database root directory.
- Example commands
$ pantools synteny_overview tomato_DB
- Output
synteny_blocks_statistics.csv, statistics about homology relationships and synteny blocks between sequences.
blocks_overview.csv, overview of all synteny blocks.
- In the synteny directory:
sequence_overlap_per_sequence.csv, statistics of overlap in genes between multiple synteny blocks with other sequences.
sequence_overlap_per_block.csv, statistics of overlap in genes between multiple synteny blocks.
- In the synteny/statistics directory:
gene_frequency.csv, overview of which genes belong to which synteny blocks.
genome_overlap.csv, overview of the synteny blocks in each genome.