Installing and configuring the required software
3. Install dependencies, either manually or through conda.
For PanTools developers:
The preferred option is to download the .jar file from https://git.wur.nl/bioinformatics/pantools/-/releases. Alternatively, follow the installation and compilation instructions from the README.md file in the desired version (e.g. https://git.wur.nl/bioinformatics/pantools/-/tree/v4.2.0).
Test if PanTools is executable (replace ‘/path/to’ with the correct directory structure):
$ java -jar /path/to/pantools-4.2.0.jar --help
If the help page does not appear this (likely) means you don’t have a properly working Java version 8. Java is included in the PanTools conda environment, please consider to first install the environment. To manually download Java, follow the instructions at https://www.java.com/en/download.
Set PanTools alias
To avoid typing long command line arguments every time, we suggest setting an alias to your profile. Set an alias in your ~/.bashrc using the following command. Always include the full path to PanTools’ .jar file.
If Java is set to your $PATH (meaning you can directly call java, test with:
java -version) you can run the following:
$ echo "alias pantools='java -Xms20g -Xmx50g -jar /path/to/pantools-4.2.0.jar'" >> ~/.bashrc
If Java is not set to your $PATH, include the full path in the alias. Again, replace ‘/path/to’ with the correct directory structure.
$ echo "alias pantools='/path/to/jdk1.8.0_161/bin/java -Xms20g -Xmx50g -jar /path/to/pantools-4.2.0.jar'" >> ~/.bashrc
Source your ~/.bashrc and test if the alias works.
$ source ~/.bashrc pantools --version
PanTools uses picocli command line parsing. Tab autocompletion for the command line can be enabled for an alias with the following tutorial.
Although Neo4j is not needed for any of the PanTools functionalities, it is required to be able to start up a database and use cypher queries. In the PanTools versions up to 3.2 we use Neo4j 3.5.3 libraries, whereas newer releases use Neo4j 3.5.30. Neo4j version 3.5.30 is compatible with all earlier PanTools versions.
Download the Neo4j 3.5.30 community edition from the Neo4j website or download the binaries directly from our server.
$ wget http://www.bioinformatics.nl/pangenomics/tutorial/neo4j-community-3.5.30-unix.tar.gz $ tar -xvzf neo4j-community-* # Edit your ~/.bashrc to include Neo4j to your $PATH $ echo "export PATH=/path/to/neo4j-community-3.5.30/bin:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ neo4j status # test if Neo4j is executable
Official Neo4j 3.5 manual: https://neo4j.com/docs/operations-manual/3.5/
Some of PanTools functionalities require additional software to be installed. Installing every dependency will take a considerate amount of time, therefore we highly recommend to use Mamba. Mamba efficiently manages Conda environments allowing the installation of all required tools into a separate environment. Instructions for creating the Mamba environment or installing the tools manually are found in the sections below.
Install dependencies using Conda
Instructions on how to install and use conda can be found in the conda manual page. Once conda is installed, we suggest to install Mamba into the Conda base environment to enable much faster dependency solving.
To install all dependencies into a separate environment, run the following commands. Please choose the conda_linux.yml or conda_macos.yml file depending on your operating system. These files be found in the release. The difference between the two files is that the linux file contains BUSCO v5.2.2, which is not compatible with the other dependencies on macOS.
$ conda install mamba -n base -c conda-forge #install mamba into the base environment $ mamba env create -n pantools -f conda_linux.yml #for Linux $ mamba env create -n pantools -f conda_macos.yml #for macOS $ conda activate pantools # activate the environment before using PanTools $ conda deactivate # deactivate when you are done
Run the following commands when you do not want to install every dependency, but only specific ones for the analysis that you’re interested in.
$ conda create -n pantools python=3.6 kmc=3.0 mcl # Creates an environment that is able to construct the pangenome and cluster protein sequences $ conda install -n pantools mafft iqtree fasttree blast mash fastani busco=5.2.2 r-ggplot2 r-ape graphviz aster=1.3 # include tools you want to install via conda
Manual installation of dependencies
All tools must be set to your $PATH so PanTools is able to use them on any location. The instructions below are based on a linux machine.
PanTools requires KMC v2.3 or 3.0 for k-mer counting during the constructing of the pangenome graph. KMC v3.0 is fastest, but v2.3 should also be compatible with PanTools. The KMC3 binaries can be downloaded from https://github.com/refresh-bio/KMC/releases.
$ tar -xvzf KMC* #uncompress the KMC binaries # Edit your ~/.bashrc to include KMC to your PATH $ echo "export PATH=/path/to/KMC/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ kmc # test if KMC is executable $ kmc_tools # test if kmc_tools is executable
The MCL (Markov clustering) algorithm is required for the homology grouping of PanTools. The software can be found on https://micans.org/mcl under License & software.
$ wget https://micans.org/mcl/src/mcl-14-137.tar.gz $ tar -xvzf mcl-* $ cd mcl-14-137 $ ./configure --prefix=/path/to/mcl-14-137/shared #replace /path/to with the correct path on your computer $ make install # Edit your ~/.bashrc to include MCL to your PATH $ echo "export PATH=/path/to/mcl-14-137/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ mcl -h # test if MCL is executable
BUSCO v3 to v5 can be run against the pangenome to estimate annotation completeness. The versions require a different Python release and need to be installed in a different way. We suggest to install BUSCO v5, follow the instructions at https://gitlab.com/ezlab/busco/.
FastTree is used to infer approximately-maximum-likelihood phylogenetic trees from the alignments of nucleotide or protein sequences which are extracted from the pangenome. An executable can be found on the FastTree website: http://www.microbesonline.org/fasttree/.
$ wget http://www.microbesonline.org/fasttree/FastTree $ chmod +x FastTree $ ./FastTree # test if FastTree is executable # Edit your ~/.bashrc to include FastTree to your PATH $ echo "export PATH=/path/to:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc
R and some additional R packages are required to execute R scripts (files with .R extension) that create plots and construct Neighbor-Joining phylogenies. In most cases, R is already installed on a server. If this is not the case, install it through the instructions on the website https://cran.r-project.org/, or compile it by using following steps.
mkdir R mkdir R/R_LIBS cd R wget https://cran.r-project.org/src/base/R-4/R-4.0.2.tar.gz #version number might have changed already tar -xvf R-4.0.2.tar.gz cd R-4.0.2/ ./configure --prefix=/path/to/R/ #replace /path/to with the correct path on your computer make # Edit your ~/.bashrc to include R to your PATH $ echo "export PATH=/path/to/R/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ R --help # test if R is executable
When R_LIB is set to your $PATH, R scripts know the location of the libraries and are able to install additional R packages to the selected directory.
$ echo "R_LIBS=/path/to/R/R_LIBS/" >> ~/.bashrc #replace /path/to with the correct path on your computer $ echo "export R_LIBS" >> ~/.bashrc $ echo $R_LIBS # validate if the path to the R libraries can be found
MAFFT is required for all the alignment functionalities, such as the alignment of homology groups and inferring the core SNP phylogeny. The full manual is available at https://mafft.cbrc.jp/alignment/software/.
$ git clone https://github.com/GSLBiotech/mafft.git $ cd mafft/core # Edit the first line of Makefile to change the desired install location, from 'PREFIX = /usr/local' to 'PREFIX = /YOUR_DESIRED_PATH/mafft/' # Make sure the 'ENABLE_MULTITHREAD = -Denablemultithread' line is uncommented, to enable multithreading # Edit your ~/.bashrc to include MAFFT to your $PATH $ echo "export PATH=/path/to/mafft/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ mafft --help # test if MAFFT is executable
Using IQ-tree we infer phylogenetic trees by maximum likelihood. Information about the tool can found on their webpage https://github.com/ebi-pf-team/interproscan/wiki/HowToDownload
wget https://github.com/Cibiv/IQ-TREE/releases/download/v1.6.12/iqtree-1.6.12-Linux.tar.gz tar -xvf iqtree-1.6.12-Linux # Edit your ~/.bashrc to include IQ-tree to your $PATH $ echo "export PATH=/path/to/iqtree-1.6.12-Linux/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ iqtree -h # test if IQ-tree is executable
Install fastANI or MASH
To be able to construct a Neighbor-Joining phylogeny using ANI-scores, either fastANI or MASH is required. The manual for fastANI is available at https://github.com/ParBLiSS/FastANI/. The manual for MASH can be found at https://mash.readthedocs.io/en/latest/.
$ wget https://github.com/marbl/Mash/releases/download/v2.2/mash-Linux64-v2.2.tar $ tar -xvf mash-Linux64-v2.2.tar $ mv mash-Linux64-v2.2/mash . $ wget https://github.com/ParBLiSS/FastANI/releases/download/v1.32/fastANI-Linux64-v1.32.zip # $ unzip fastANI-Linux64-v1.32.zip # Edit your ~/.bashrc to include MASH and FastANI to your $PATH $ echo "export PATH=/path/to/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ mash -h # test if MASH is executable $ fastANI -h # test if FastANI is executable
BLAST is only required by one function, where the sequences are blasted against a database to obtain their COG category. Information about BLAST can be found at https://www.ncbi.nlm.nih.gov/books/NBK279690/?report=classic.
$ wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.10.1+-x64-linux.tar.gz $ tar -xvf ncbi-blast-2.10.1+-x64-linux.tar.gz # Edit your ~/.bashrc to include BLAST to your $PATH $ echo "export PATH=/path/to/ncbi-blast-2.10.1+/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ blastp -help # test if BLAST is executable
ASTER is required for creating a phylogenetic tree based on both orthologs and paralogs with astral-pro. The manual for ASTER can be found at https://github.com/chaoszhang/ASTER.
$ git clone https://github.com/chaoszhang/ASTER.git $ cd ASTER $ git checkout v1.3 $ make # Edit your ~/.bashrc to include ASTER to your $PATH $ echo "export PATH=/path/to/ASTER/bin/:\$PATH" >> ~/.bashrc #replace /path/to with the correct path on your computer $ source ~/.bashrc $ astral-pro -h # test if ASTER is executable
Not required by any function, but the .GFF3 output of InterProScan can be read to include functional annotations to the database. The installation itself can be quite tricky as it uses many different third-party binaries and each having their own dependencies. Please check https://github.com/ebi-pf-team/interproscan/wiki/HowToDownload and take a look at the install requirements as well. Installation of the Panther models is not required.
Phobius via InterProScan
Phobius predictions can be performed during the InterProScan analysis but it is not part of the standard set of predictions. To allow these predictions, https://phobius.sbc.su.se/, place the entire directory in the InterProScan/bin/ directory and edit the interproscan.properties configuration file. More information about including Phobius into the InterProScan analysis is found https://interproscan-docs.readthedocs.io/en/latest/ActivatingLicensedAnalyses.html.
Not required by any function, but the .annotations output of eggNOG-mapper can be read to include functional annotations to the database. Information about this tool can be found on http://eggnog-mapper.embl.de/
git clone https://github.com/eggnogdb/eggnog-mapper.git
Installing pre-commit hooks
First install the pre-commit Python package by following the installation instructions.
Then, inside the root directory of the repository, run:
This step you will need to run only once after cloning the repository.
The hooks will be installed in your local repository’s configuration
After installation of the hooks they will be triggered at each commit if any Java files have changed. Should any of the pre-commit hooks fail, git will not allow you to create the commit. The output of the pre-commit hooks should tell you what failed, allowing you to fix any problems and to re-add the affected files for another commit attempt.
Pre-commit hooks can be run manually as well with: