Synteny ======= Calculate synteny ^^^^^^^^^^^^^^^^^ .. warning:: This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it. Estimate synteny between sequences of the pangenome using MCScanX. PanTools generates MCScanX's required input **GFF** and **.homology** file for every pairwise comparison. The GFF holds the gene identififers and genomic coordinates. The .homology file is a two-column tab separated file that states which genes are homologous to another. Two genes are considered homologous when part of the same homology group together with a protein similarity greater than the threshold set during :ref:`group `. | **Executing MCScanX** | PanTools executes MCscanX_h with default settings when ``--run`` argument is included. Pairwise comparisons are divided over the number of ``--threads`` provided by the user. Please consider the number of sequences in your analysis and create a subset by using ``--selection-file``. The output of each comparison is written to a separate folder. Once the threads are finished, every output (**.collinearity**) file is collected and combined into a single file. | To allow the visualization of synteny blocks we recommend using `Synvisio `__ or `Accusyn `__. | **Sequence identifiers change** | Both MCScanX and the online visualization tools have some issues with using the regular identifiers (1_1, 1_2, 2_1 etc.). To be able to work with all tools we change the identifiers into a two letter combination with a number, current limit is 676 genomes with ``--genome`` or 676 sequences using ``--sequence``. PanTools will try to give each genome a unique first letter but this is only possible with 26 or less genomes and 26 sequences per genome. **Required software** `MCScanX `__ must be manually :ref:`installed ` and set to your $PATH, since it is unavailable on conda. **Parameters** .. list-table:: :widths: 30 70 * - - Path to the database root directory. **Options** .. list-table:: :widths: 30 70 * - ``--include``/``-i`` - Only include a selection of genomes. This automatically lowers the threshold for core genes. * - ``--exclude``/``-e`` - Exclude a selection of genomes. This automatically lowers the threshold for core genes. * - ``-—selection-file`` - Text file with rules to use a specific set of genomes and sequences. This automatically lowers the threshold for core genes. * - ``--run`` - Run MCScanX_h (default: false) * - ``--threads``/``-t`` - Number of parallel threads to be used. * - ``--sequence`` - Calculate synteny between sequences (from the same genome) instead of genomes. **Example commands** .. code:: bash $ pantools calculate_synteny tomato_DB $ pantools calculate_synteny tomato_DB --sequence $ pantools calculate_synteny tomato_DB --sequence --run -t=24 **Output** Output files are written to the **synteny** directory in the database. - **mcscanx.gff**, the genomic coordinates of all genes included in the analysis. - **mcscanx.homology**, all pairwise homology relationships in the analysis. - **synteny_identifiers.csv**, table with the original and synteny identifiers. When ``--run`` is included: - **mcscanx.collinearity**, main output file of MCScanX_h. Contains synteny blocks that consist of pairwise collinear gene pairs. Usable by Synvisio and Accusyn in combination with mcscanx.gff. Can be included to the pangenome with :ref:`add_synteny `. - A .gff, .homology, .collinearity file for every sequence combination. Files are written to a folder named after the combination of two sequence identifiers. This folder also holds two .html files that visualizes collinear blocks and duplication depth between the two sequences. **Relevant literature** - `MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity `__ - `Interactive Exploration of Genomic Conservation (Synvisio) `__ - `Using Simulated Annealing to Declutter Genome Visualizations (Accusyn) `__ ----------------------- Add synteny ^^^^^^^^^^^ .. warning:: This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it. Include synteny information into the pangenome. **Parameters** .. list-table:: :widths: 30 70 * - - Path to the database root directory. * - - A MCScanX .collinearity file. **Example commands** .. code:: bash $ pantools add_synteny tomato_DB tomato_DB/synteny/mcscanx.collinearity **Example input** Required input is a .collinearity file generated by MCScanX. The example below shows the first three syntenic blocks calculated within an apple genome (sequence 2_3 to 2_28). .. code:: text ############### Parameters ############### # MATCH_SCORE: 50 # MATCH_SIZE: 5 # GAP_PENALTY: -1 # OVERLAP_WINDOW: 5 # E_VALUE: 1e-05 # MAX GAPS: 25 ############### Statistics ############### # Number of collinear genes: 217123, Percentage: 80.16 # Number of all genes: 270847 ########################################## ## Alignment 0: score=251.0 e_value=1.1e-10 N=6 bk1&cj1 plus 0- 0: 2_3#Mdg_11A016020_mRNA1 2_28#Mdg_06B006370_mRNA1 0 0- 1: 2_3#Mdg_11A016050_mRNA1 2_28#Mdg_06B006500_mRNA1 0 0- 2: 2_3#Mdg_11A016230_mRNA1 2_28#Mdg_06B006510_mRNA1 0 0- 3: 2_3#Mdg_11A016330_mRNA1 2_28#Mdg_06B006570_mRNA1 0 0- 4: 2_3#Mdg_11A016390_mRNA1 2_28#Mdg_06B006590_mRNA1 0 0- 5: 2_3#Mdg_11A016400_mRNA1 2_28#Mdg_06B006660_mRNA1 0 ## Alignment 1: score=258.0 e_value=4.1e-11 N=6 bk1&cj1 minus 1- 0: 2_3#Mdg_11A015930_mRNA1 2_28#Mdg_06B006850_mRNA1 0 1- 1: 2_3#Mdg_11A016020_mRNA1 2_28#Mdg_06B006770_mRNA1 0 1- 2: 2_3#Mdg_11A016050_mRNA1 2_28#Mdg_06B006760_mRNA1 0 1- 3: 2_3#Mdg_11A016230_mRNA1 2_28#Mdg_06B006570_mRNA1 0 1- 4: 2_3#Mdg_11A016330_mRNA1 2_28#Mdg_06B006510_mRNA1 0 1- 5: 2_3#Mdg_11A016390_mRNA1 2_28#Mdg_06B006500_mRNA1 0 ## Alignment 2: score=252.0 e_value=3.1e-12 N=6 bk1&cj1 minus 2- 0: 2_3#Mdg_11A015930_mRNA1 2_28#Mdg_06B006880_mRNA1 0 2- 1: 2_3#Mdg_11A016020_mRNA1 2_28#Mdg_06B006870_mRNA1 0 2- 2: 2_3#Mdg_11A016050_mRNA1 2_28#Mdg_06B006860_mRNA1 0 2- 3: 2_3#Mdg_11A016230_mRNA1 2_28#Mdg_06B006770_mRNA1 0 2- 4: 2_3#Mdg_11A016390_mRNA1 2_28#Mdg_06B006660_mRNA1 0 2- 5: 2_3#Mdg_11A016400_mRNA1 2_28#Mdg_06B006590_mRNA1 0 --------------------------- Synteny Overview ^^^^^^^^^^^^^^^^ .. warning:: This is a novel function and has not yet undergone testing by external users. Please report any bugs or issues to the PanTools team so we can improve it. **Parameters** .. list-table:: :widths: 30 70 * - - Path to the database root directory. **Example commands** .. code:: bash $ pantools synteny_overview tomato_DB **Output** - **synteny_blocks_statistics.csv**, statistics about homology relationships and synteny blocks between sequences. - **blocks_overview.csv**, overview of all synteny blocks. In the **synteny** directory: - **sequence_overlap_per_sequence.csv**, statistics of overlap in genes between multiple synteny blocks with other sequences. - **sequence_overlap_per_block.csv**, statistics of overlap in genes between multiple synteny blocks. In the **synteny/statistics** directory: - **gene_frequency.csv**, overview of which genes belong to which synteny blocks. - **genome_overlap.csv**, overview of the synteny blocks in each genome.