Installing PanTools from source =============================== For easy installation, please see :doc:`/getting_started/install` in the user guide. On this page, we will describe the full installation process needed for developers. 1. :ref:`developer_guide/install:install dependencies` 2. :ref:`developer_guide/install:install neo4j` 3. :ref:`developer_guide/install:compile PanTools` 4. :ref:`developer_guide/install:install Sphinx` 5. :ref:`developer_guide/install:install pre-commit hooks` -------------- Install dependencies -------------------- Some of PanTools functionalities require additional software to be installed. Installing every dependency will take a considerate amount of time, therefore we highly recommend to use mamba. Mamba efficiently manages conda environments allowing the installation of all required tools into a separate environment. Instructions for creating the mamba environment or installing the tools manually are found in the sections below. Install dependencies using Conda ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To install all dependencies into a separate environment, run the following commands. Please use the conda.yaml file which can be found in the `release `_ or in the PanTools home when working with the git itself. For smooth dependency resolving with conda, it is recommended to use strict channel priority and to only use the bioconda and conda-forge channels. .. code:: bash $ mamba env create -n pantools -f conda.yaml -------------- Manual installation of dependencies ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All tools must be set to your PATH so PanTools is able to use them. The instructions below are based on a Linux machine. Also, please note that some tools required may be missing from the list below. You can install them using conda or mamba. Install KMC """"""""""" PanTools requires **KMC v3.1.0** or higher. The KMC3 binaries can be downloaded from .. code:: bash $ tar -xvzf KMC* #uncompress the KMC binaries # Edit your ~/.bashrc to include KMC to your PATH $ echo "export PATH=/path/to/KMC/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ kmc # test if KMC is executable $ kmc_tools # test if kmc_tools is executable -------------- Install MCL """"""""""" The MCL (Markov clustering) algorithm is required for the homology grouping of PanTools. The software can be found on under License & software. .. code:: bash $ wget $ tar -xvzf mcl-* $ cd mcl-14-137 $ ./configure --prefix=/path/to/mcl-14-137/shared #replace /path/to with the correct path on your computer $ make install # Edit your ~/.bashrc to include MCL to your PATH $ echo "export PATH=/path/to/mcl-14-137/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ mcl -h # test if MCL is executable -------------- Install BUSCO """"""""""""" **BUSCO v3 to v5** can be run against the pangenome to estimate annotation completeness. The versions require a different Python release and need to be installed in a different way. We suggest to install BUSCO v5, follow the instructions at -------------- Install FastTree """""""""""""""" **FastTree** is used to infer approximately-maximum-likelihood phylogenetic trees from the alignments of nucleotide or protein sequences which are extracted from the pangenome. An executable can be found on the FastTree website: .. code:: bash $ wget $ chmod +x FastTree $ ./FastTree # test if FastTree is executable # Edit your ~/.bashrc to include FastTree to your PATH $ echo "export PATH=/path/to:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc -------------- Install R """"""""" **R** and some additional R packages are required to execute R scripts (files with .R extension) that create plots and construct Neighbor-Joining phylogenies. In most cases, R is already installed on a server. If this is not the case, install it through the instructions on the website, or compile it by using following steps. .. code:: bash mkdir R mkdir R/R_LIBS cd R wget #version number might have changed already tar -xvf R-4.0.2.tar.gz cd R-4.0.2/ ./configure --prefix=/path/to/R/ #replace /path/to with the correct path on your computer make # Edit your ~/.bashrc to include R to your PATH $ echo "export PATH=/path/to/R/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ R --help # test if R is executable When **R_LIB** is set to your $PATH, R scripts know the location of the libraries and are able to install additional R packages to the selected directory. .. code:: bash $ echo "R_LIBS=/path/to/R/R_LIBS/" >> ~/.bashrc #replace /path/to with the correct path on your computer $ echo "export R_LIBS" >> ~/.bashrc $ echo $R_LIBS # validate if the path to the R libraries can be found -------------- Install MAFFT """"""""""""" **MAFFT** is required for all the alignment functionalities, such as the alignment of homology groups and inferring the core SNP phylogeny. The full manual is available at .. code:: bash $ git clone $ cd mafft/core # Edit the first line of Makefile to change the desired install location, from 'PREFIX = /usr/local' to 'PREFIX = /YOUR_DESIRED_PATH/mafft/' # Make sure the 'ENABLE_MULTITHREAD = -Denablemultithread' line is uncommented, to enable multithreading # Edit your ~/.bashrc to include MAFFT to your $PATH $ echo "export PATH=/path/to/mafft/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ mafft --help # test if MAFFT is executable -------------- Install IQ-tree """"""""""""""" Using IQ-tree we infer phylogenetic trees by maximum likelihood. Information about the tool can found on their webpage .. code:: bash wget tar -xvf iqtree-1.6.12-Linux # Edit your ~/.bashrc to include IQ-tree to your $PATH $ echo "export PATH=/path/to/iqtree-1.6.12-Linux/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ iqtree -h # test if IQ-tree is executable -------------- Install fastANI or MASH """"""""""""""""""""""" To be able to construct a Neighbor-Joining phylogeny using ANI-scores, either **fastANI** or **MASH** is required. The manual for **fastANI** is available at The manual for **MASH** can be found at .. code:: bash $ wget $ tar -xvf mash-Linux64-v2.2.tar $ mv mash-Linux64-v2.2/mash . $ wget # $ unzip # Edit your ~/.bashrc to include MASH and FastANI to your $PATH $ echo "export PATH=/path/to/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ mash -h # test if MASH is executable $ fastANI -h # test if FastANI is executable -------------- Install BLAST """"""""""""" BLAST is only required by one function, where the sequences are blasted against a database to obtain their COG category. Information about BLAST can be found at .. code:: bash $ wget $ tar -xvf ncbi-blast-2.10.1+-x64-linux.tar.gz # Edit your ~/.bashrc to include BLAST to your $PATH $ echo "export PATH=/path/to/ncbi-blast-2.10.1+/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ blastp -help # test if BLAST is executable -------------- Install ASTER """"""""""""" ASTER is required for creating a phylogenetic tree based on both orthologs and paralogs with astral-pro. The manual for ASTER can be found at .. code:: bash $ git clone $ cd ASTER $ git checkout v1.3 $ make # Edit your ~/.bashrc to include ASTER to your $PATH $ echo "export PATH=/path/to/ASTER/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ astral-pro -h # test if ASTER is executable -------------- Install InterProScan """""""""""""""""""" Not required by any function, but the .GFF3 output of **InterProScan** can be read to include functional annotations to the database. The installation itself can be quite tricky as it uses many different third-party binaries and each having their own dependencies. Please check and take a look at the install requirements as well. Installation of the Panther models is not required. Phobius via InterProScan """""""""""""""""""""""" Phobius predictions can be performed during the InterProScan analysis but it is not part of the standard set of predictions. To allow these predictions,, place the entire directory in the InterProScan/bin/ directory and edit the **** configuration file. More information about including Phobius into the InterProScan analysis is found ``_. Install eggNOGmapper """""""""""""""""""" Not required by any function, but the .annotations output of **eggNOG-mapper** can be read to include functional annotations to the database. Information about this tool can be found on .. code:: bash git clone -------------- Install Neo4j ------------- Although Neo4j is not needed for any of the PanTools functionalities, it is required to visualize the graph database and use cypher queries. In the PanTools versions up to 3.2 we use Neo4j 3.5.3 libraries, whereas newer releases use Neo4j 3.5.30. Neo4j version 3.5.30 is compatible with all earlier PanTools versions. Download the Neo4j 3.5.30 community edition from the `Neo4j website `_ or download the binaries directly from our `server `_. .. code:: bash $ wget $ tar xvzf neo4j-community-3.5.30-unix.tar.gz # Edit your ~/.bashrc to include Neo4j to your $PATH $ echo "export PATH=/path/to/neo4j-community-3.5.30/bin:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ neo4j status # test if Neo4j is executable Official Neo4j 3.5 manual: -------------- Compile PanTools ---------------- PanTools is written in Java and can be compiled using Maven. The instructions for compilation are written in the ``pom.xml`` file. The following commands can be used to compile PanTools (in no particular order): .. code:: bash # Compile PanTools mvn compile # Run the tests mvn test # Create a fat jar file for PanTools (including all dependencies) mvn package # Compile PanTools without running the tests mvn package -DskipTests If you have created a fat jar with the ``mvn package`` command, you can run PanTools using the following command: .. substitution-code-block:: bash java -jar target/pantools-|ProjectVersion|.jar Please note that the version is always the version of the latest release. To see the exact version of the jar file, you can use the following command: .. substitution-code-block:: bash java -jar target/pantools-|ProjectVersion|.jar --version Finally, for development purposes, it is possible to not create a fat jar file and run PanTools directly from the compiled Java classes. This can be done using the following command: .. code:: bash # locally mvn compile mvn dependency:copy-dependencies rsync -avPhz target/{dependency,classes} # on the remote server alias pantools-dev='java -cp "/path/to/dev/pantools/target/dependency/*:/path/to/dev/pantools/target/classes" nl.wur.bif.pantools.Pantools' pantools-dev --version -------------- Install Sphinx -------------- Sphinx is required to build the documentation we host on ReadTheDocs. It is possible to test and build the documentation locally. The documentation is written in reStructuredText and can be found in the ``docs/`` directory. In order to test and build the documentation locally, you need to install Sphinx, sphinx-rtd-theme and sphinx-lint, as well as graphviz for the sphinx.ext.graphviz extension (used for creating graphs in the documentation). Please make sure you use python 3.7 as this is important for the version of sphinx-lint (and pre-commit). .. code:: bash # Install graphviz mamba install graphviz # Install sphinx, sphinx-rtd-theme and sphinx-lint pip install sphinx sphinx-rtd-theme sphinx-lint The following commands can be used to test and build the documentation: .. code:: bash # Test the documentation sphinx-lint sphinx-lint -e=all --max-line-length=80 # Build the documentation sphinx-build -W docs/source output -------------- Install pre-commit hooks ------------------------ It is highly recommended to install the pre-commit hooks to ensure that all code you commit to the repository is properly formatted and passes all checks. This will help to keep the code base clean and consistent. First install the `pre-commit `__ Python package by following the `installation instructions `__. Please first install Sphinx as described above. .. code:: bash pip install pre-commit Then, inside the root directory of the repository, run: .. code:: bash pre-commit install This step you will need to run only once after cloning the repository. The hooks will be installed in your local repository's configuration under ``.git/hooks/pre-commit``. After installation of the hooks, only relevant files will be checked when committing changes. For example, if you change a Java file, only the Java related hooks will be triggered. Should any of the pre-commit hooks fail, git will not allow you to create the commit. The output of the pre-commit hooks should tell you what failed, allowing you to fix any problems and to re-add the affected files for another commit attempt. Pre-commit hooks can be run manually as well with: .. code:: bash pre-commit run -a -------------- End-to-end pipeline ------------------------ We include an end-to-end pipeline that automatically runs online on However, it is also possible to run this pipeline locally. The pipeline is written in Snakemake and can be found in the ``tests/`` directory. The pipeline should work on both macOS and Linux. For running the pipeline locally, please first create a conda environment as follows: .. code:: bash mamba create -n pantools-snakemake snakemake=7.19.1 kmc mcl=14.137 samtools=1.15 bcftools=1.15.1 mafft=7.520 fasttree=2.1.11 openjdk=8 maven graphviz aster=1.3 mash=2.3 python=3.7 conda activate pantools-snakemake pip install pre-commit sphinx sphinx-rtd-theme sphinx-lint #for commit hooks For the pipeline, we need to download the yeast pangenome and/or panproteome test data which we deposited on our in-house server as ``yeast_pangenome.tar.gz`` and ``yeast_panproteome.tar.gz``. For access to these files, please contact the PanTools developers. Next, unpack the files as follows: .. code:: bash cd tests tar xvzf yeast_pangenome.tar.gz #for yeast pangenome tar xvzf yeast_panproteome.tar.gz #for yeast panproteome Then, to run the pipeline, run the following in the ``tests/`` directory: .. code:: bash conda activate pantools-snakemake #if not already activated snakemake -rpc1 --configfile config/yeast_pangenome_yaml #for yeast pangenome snakemake -rpc1 --configfile config/yeast_panproteome_yaml #for yeast panproteome