Installing and configuring the required software
================================================

1. :ref:`user_guide/install:download pantools`
2. :ref:`user_guide/install:install neo4j`
3. :ref:`Install dependencies, either manually or through conda.
<user_guide/install:dependencies>`

For PanTools developers:

4. :ref:`user_guide/install:installing pre-commit hooks`

Download PanTools
-----------------

The preferred option is to download the .jar file from
https://git.wur.nl/bioinformatics/pantools/-/releases and put it in a
directory named "pantools/target".

Alternatively, follow the installation and compilation instructions from the
README.md file in the desired version (e.g. |PantoolsGit| ).

Test if PanTools is executable:

.. substitution-code-block:: bash

   $ java -jar /YOUR_FULL_PATH/pantools/target/pantools-|ProjectVersion|.jar

If the help page does not appear this (likely) means you don't have a
properly working Java version 8. Java is included in the PanTools conda
environment, please consider to first install the environment. To
manually download Java, follow the instructions at
https://www.java.com/en/download.

Set PanTools alias
~~~~~~~~~~~~~~~~~~

To avoid typing long command line arguments every time, we suggest setting
an alias to your profile. Set an alias in your ~/.bashrc using the following
command. Always include the **full** path to PanTools' .jar file.

If Java is set to your $PATH.

.. substitution-code-block:: bash

   $ echo "alias pantools='java -Xms20g -Xmx50g -jar /YOUR_FULL_PATH/pantools/target/pantools-|ProjectVersion|.jar'" >> ~/.bashrc

If Java is not set to your $PATH, include the **full** path in the
alias. Replace 'YOUR_PATH' 2x with the correct directory structure.

.. substitution-code-block:: bash

   $ echo "alias pantools='/YOUR_PATH/jdk1.8.0_161/bin/java -Xms20g -Xmx50g -jar /YOUR_PATH/pantools/target/pantools-|ProjectVersion|.jar'" >> ~/.bashrc

Source your ~/.bashrc and test if the alias works.

.. code:: bash

   $ source ~/.bashrc
   pantools --version

PanTools uses picocli command line parsing. Tab autocompletion for
the command line can be enabled for an alias with the following
`tutorial <https://picocli.info/autocomplete.html>`_.

--------------

Install Neo4j
-------------

Although Neo4j is not needed for any of the PanTools functionalities, it
is required to be able to start up a database and use cypher queries. In
the PanTools versions up to 3.2 we use Neo4j 3.5.3 libraries, whereas
newer releases use Neo4j 3.5.30. Neo4j version 3.5.30 is compatible with
all earlier PanTools versions.

Download the Neo4j 3.5.30 community edition from the `Neo4j
website <https://neo4j.com/download-center/>`_ or download the binaries
directly from our
`server <http://www.bioinformatics.nl/pangenomics/tutorial/
neo4j-community-3.5.30-unix.tar.gz>`_.

.. code:: bash

   $ wget http://www.bioinformatics.nl/pangenomics/tutorial/neo4j-community-3.5.30-unix.tar.gz
   $ tar -xvzf neo4j-community-*

   # Edit your ~/.bashrc to include Neo4j to your $PATH
   $ echo "export PATH=/YOUR_PATH/neo4j-community-3.5.30/bin:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc
   $ neo4j status # test if Neo4j is executable

Official Neo4j 3.5 manual: https://neo4j.com/docs/operations-manual/3.5/

--------------

Dependencies
------------

Some of PanTools functionalities require additional software to be
installed. Installing every dependency will take a considerate amount of
time, therefore we highly recommend to use Mamba. Mamba efficiently
manages Conda environments allowing the installation of all required
tools into a separate environment. Instructions for creating the Mamba
environment or installing the tools manually are found in the sections
below.

Install dependencies using Conda
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Instructions on how to install and use conda can be found in the
\ **conda manual page**\ . Once conda is installed, we suggest to
install Mamba into the Conda base environment to enable much faster
dependency solving.

To install all dependencies into a separate environment, run the
following commands. Please choose the conda_linux.yml or
conda_macos.yml file depending on your operating system. These files
be found in the `release <https://git.wur.nl/bioinformatics/pantools/-/
releases>`_. The difference between the two files is that the linux file
contains BUSCO v5.2.2, which is not compatible with the other dependencies on
macOS.

.. code:: bash

   $ conda install mamba -n base -c conda-forge #install mamba into the base environment

   $ cd /YOUR_PATH/pantools #replace YOUR_PATH with the correct path on your computer
   $ mamba env create -n pantools -f conda_linux.yml #for Linux
   $ mamba env create -n pantools -f conda_macos.yml #for macOS

   $ conda activate pantools # activate the environment before using PanTools
   $ conda deactivate # deactivate when you are done

Run the following commands when you do not want to install every
dependency, but only specific ones for the analysis that you're
interested in.

.. code:: bash

   $ conda create -n pantools python=3.6 kmc=3.0 mcl # Creates an environment that is able to construct the pangenome and cluster protein sequences
   $ conda install -n pantools mafft iqtree fasttree blast mash fastani busco=5.2.2 r-ggplot2 r-ape graphviz aster=1.3 # include tools you want to install via conda

--------------

Manual installation of dependencies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All tools must be set to your $PATH so PanTools is able to use them on
any location. The instructions below are based on a linux machine.

Install KMC
"""""""""""

PanTools requires **KMC v2.3** or **3.0** for k-mer counting during the
constructing of the pangenome graph. KMC v3.0 is fastest, but v2.3
should also be compatible with PanTools. The KMC3 binaries can be
downloaded from https://github.com/refresh-bio/KMC/releases.

.. code:: bash

   $ tar -xvzf KMC* #uncompress the KMC binaries

   # Edit your ~/.bashrc to include KMC to your PATH
   $ echo "export PATH=/YOUR_PATH/KMC/:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc
   $ kmc # test if KMC is executable
   $ kmc_tools # test if kmc_tools is executable

--------------

Install MCL
"""""""""""

The MCL (Markov clustering) algorithm is required for the homology
grouping of PanTools. The software can be found on
https://micans.org/mcl under License & software.

.. code:: bash

   $ wget https://micans.org/mcl/src/mcl-14-137.tar.gz
   $ tar -xvzf mcl-*
   $ cd mcl-14-137
   $ ./configure --prefix=/YOUR_PATH/mcl-14-137/shared #replace YOUR_PATH with the correct path on your computer
   $ make install

   # Edit your ~/.bashrc to include MCL to your PATH
   $ echo "export PATH=/YOUR_PATH/mcl-14-137/bin/:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc
   $ mcl -h # test if MCL is executable

--------------

Install BUSCO
"""""""""""""

**BUSCO v3 to v5** can be run against the pangenome to estimate
annotation completeness. The versions require a different Python release
and need to be installed in a different way. We suggest to install BUSCO
v5, follow the instructions at https://gitlab.com/ezlab/busco/.

--------------

Install FastTree
""""""""""""""""

**FastTree** is used to infer approximately-maximum-likelihood
phylogenetic trees from the alignments of nucleotide or protein
sequences which are extracted from the pangenome. An executable can be
found on the FastTree website: http://www.microbesonline.org/fasttree/.

.. code:: bash

   $ wget http://www.microbesonline.org/fasttree/FastTree
   $ chmod +x FastTree
   $ ./FastTree # test if FastTree is executable

   # Edit your ~/.bashrc to include FastTree to your PATH
   $ echo "export PATH=/YOUR_PATH:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc

--------------

Install R
"""""""""

**R** and some additional R packages are required to execute R scripts
(files with .R extension) that create plots and construct
Neighbor-Joining phylogenies. In most cases, R is already installed on a
server. If this is not the case, install it through the instructions on
the website https://cran.r-project.org/, or compile it by using
following steps.

.. code:: bash

   mkdir R
   mkdir R/R_LIBS
   cd R
   wget https://cran.r-project.org/src/base/R-4/R-4.0.2.tar.gz #version number might have changed already
   tar -xvf R-4.0.2.tar.gz
   cd R-4.0.2/
   ./configure --prefix=/YOUR_PATH/R/  #replace YOUR_PATH with the correct path on your computer
   make

   # Edit your ~/.bashrc to include R to your PATH
   $ echo "export PATH=/YOUR_PATH/R/bin/:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc
   $ R --help # test if R is executable

When **R_LIB** is set to your $PATH, R scripts know the location of the
libraries and are able to install additional R packages to the selected
directory.

.. code:: bash

   $ echo "R_LIBS=/YOUR_PATH/R/R_LIBS/" >> ~/.bashrc
   $ echo "export R_LIBS" >> ~/.bashrc
   $ echo $R_LIBS # validate if the path to the R libraries can be found

--------------

Install MAFFT
"""""""""""""

**MAFFT** is required for all the alignment functionalities, such as the
alignment of homology groups and inferring the core SNP phylogeny. The
full manual is available at https://mafft.cbrc.jp/alignment/software/.

.. code:: bash

   $ git clone https://github.com/GSLBiotech/mafft.git
   $ cd mafft/core

   # Edit the first line of Makefile to change the desired install location, from 'PREFIX = /usr/local' to 'PREFIX = /YOUR_DESIRED_PATH/mafft/'
   # Make sure the 'ENABLE_MULTITHREAD = -Denablemultithread' line is uncommented, to enable multithreading

   # Edit your ~/.bashrc to include MAFFT to your $PATH
   $ echo "export PATH=/YOUR_PATH/mafft/bin/:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc
   $ mafft --help # test if MAFFT is executable

--------------

Install IQ-tree
"""""""""""""""

Using IQ-tree we infer phylogenetic trees by maximum likelihood.
Information about the tool can found on their webpage
https://github.com/ebi-pf-team/interproscan/wiki/HowToDownload

.. code:: bash

   wget https://github.com/Cibiv/IQ-TREE/releases/download/v1.6.12/iqtree-1.6.12-Linux.tar.gz
   tar -xvf iqtree-1.6.12-Linux

   # Edit your ~/.bashrc to include IQ-tree to your $PATH
   $ echo "export PATH=/YOUR_PATH/iqtree-1.6.12-Linux/bin/:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc
   $ iqtree -h # test if IQ-tree is executable

--------------

Install fastANI or MASH
"""""""""""""""""""""""

To be able to construct a Neighbor-Joining phylogeny using ANI-scores,
either **fastANI** or **MASH** is required. The manual for **fastANI**
is available at https://github.com/ParBLiSS/FastANI/. The manual for
**MASH** can be found at https://mash.readthedocs.io/en/latest/.

.. code:: bash

   $ wget https://github.com/marbl/Mash/releases/download/v2.2/mash-Linux64-v2.2.tar
   $ tar -xvf mash-Linux64-v2.2.tar
   $ mv mash-Linux64-v2.2/mash .

   $ wget https://github.com/ParBLiSS/FastANI/releases/download/v1.32/fastANI-Linux64-v1.32.zip #
   $ unzip fastANI-Linux64-v1.32.zip

   # Edit your ~/.bashrc to include MASH and FastANI to your $PATH
   $ echo "export PATH=/YOUR_PATH/:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc
   $ mash -h # test if MASH is executable
   $ fastANI -h # test if FastANI is executable

--------------

Install BLAST
"""""""""""""

BLAST is only required by one function, where the sequences are blasted
against a database to obtain their COG category. Information about BLAST
can be found at https://www.ncbi.nlm.nih.gov/books/NBK279690/?report=classic.

.. code:: bash

   $ wget https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.10.1+-x64-linux.tar.gz
   $ tar -xvf ncbi-blast-2.10.1+-x64-linux.tar.gz

   # Edit your ~/.bashrc to include BLAST to your $PATH
   $ echo "export PATH=/YOUR_PATH/ncbi-blast-2.10.1+/bin/:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc
   $ blastp -help # test if BLAST is executable

--------------

Install ASTER
"""""""""""""

ASTER is required for creating a phylogenetic tree based on both
orthologs and paralogs with astral-pro. The manual for ASTER can be
found at https://github.com/chaoszhang/ASTER.

.. code:: bash

   $ git clone https://github.com/chaoszhang/ASTER.git
   $ cd ASTER
   $ git checkout v1.3
   $ make

   # Edit your ~/.bashrc to include ASTER to your $PATH
   $ echo "export PATH=/YOUR_PATH/ASTER/bin/:\$PATH" >> ~/.bashrc #replace YOUR_PATH with the correct path on your computer
   $ source ~/.bashrc
   $ astral-pro -h # test if ASTER is executable

--------------

Install InterProScan
""""""""""""""""""""

Not required by any function, but the .GFF3 output of **InterProScan**
can be read to include functional annotations to the database. The
installation itself can be quite tricky as it uses many different
third-party binaries and each having their own dependencies. Please
check https://github.com/ebi-pf-team/interproscan/wiki/HowToDownload and
take a look at the install requirements as well. Installation of the
Panther models is not required.

Phobius via InterProScan
""""""""""""""""""""""""

Phobius predictions can be performed during the InterProScan analysis
but it is not part of the standard set of predictions. To allow these
predictions, https://phobius.sbc.su.se/, place the entire directory in
the InterProScan/bin/ directory and edit the **interproscan.properties**
configuration file. More information about including Phobius into the
InterProScan analysis is found `<https://interproscan-docs.readthedocs.io/
en/latest/ActivatingLicensedAnalyses.html>`_.

Install eggNOGmapper
""""""""""""""""""""

Not required by any function, but the .annotations output of
**eggNOG-mapper** can be read to include functional annotations to the
database. Information about this tool can be found on
http://eggnog-mapper.embl.de/

.. code:: bash

   git clone https://github.com/eggnogdb/eggnog-mapper.git

--------------

Installing pre-commit hooks
---------------------------

First install the `pre-commit <https://pre-commit.com/>`__ Python
package by following the `installation
instructions <https://pre-commit.com/#install>`__.

Then, inside the root directory of the repository, run:

.. code:: bash

   pre-commit install

This step you will need to run only once after cloning the repository.
The hooks will be installed in your local repository's configuration
under ``.git/hooks/pre-commit``.

After installation of the hooks they will be triggered at each commit if
any Java files have changed. Should any of the pre-commit hooks fail,
git will not allow you to create the commit. The output of the
pre-commit hooks should tell you what failed, allowing you to fix any
problems and to re-add the affected files for another commit attempt.

Pre-commit hooks can be run manually as well with:

.. code:: bash

   pre-commit run

--------------