Python package¶
Table of Contents
Installation¶
Important
We strongly recommand installing the Capybara package only in virtual environments.
First, install Pipenv if you don’t have it already:
pip install pipenv
Then, navigate to the project folder, create and activate a virtual environment:
cd PATH-TO-THE-PROJECT-FOLDER
pipenv install
pipenv shell
Once inside the virtual environment, follow the installation guide below.
Install the package capybara-cophylogeny via pip
:
python -m pip install capybara-cophylogeny
You can verify whether the installation is successful inside the Python interactive shell:
python
Import the two Capybara modules:
>>> import capybara.counter
>>> import capybara.enumerator
The import statements should not yield any error.
Important
Be sure to update to the newest version if you have not used Capybara for a while:
python -m pip install capybara-cophylogeny --upgrade
Improve performance with PyPy
You can speed up the computation by using PyPy3 instead of the standard CPython interpreters. To do this, follow these simple steps:
- Download and install PyPy3.
- Create and activate the virtual environment using PyPy3 as interpreter:
pipenv install --python=pypy3 pipenv shell
Already done! Python commands will now use PyPy3. For example, you can write:
python -m pip install capybara-cophylogeny python MY_SCRIPT.py
Usage¶
The user is assumed to be already familiar with the GUI version of Capybara.
The Python package provides two functions capybara.counter.run
and capybara.enumerator.run
,
corresponding to the two buttons Count and Enumerate in Optimal enumeration.
Count: capybara.counter.run
¶
The function takes 4 parameters (2 optional):
Parameters | |
---|---|
input_name | Name of the input file (or the path) |
task | Integer between 1 and 4 |
cost_vector | List (or tuple) of four integers. Default: (-1,1,1,1) |
verbose | If set to True, print a message when the computation starts and ends. Default: False |
The function returns an integer, depending on the chosen task (see also Definitions of tasks):
Return value | |
---|---|
task=1 | Number of optimal reconciliations |
task=2 | Number of event vectors |
task=3 | Number of event partitions |
task=4 | Number of equivalence classes |
Enumerate: capybara.enumerater.run
¶
The function takes 7 parameters (4 optional):
Parameters | |
---|---|
input_name | Name of the input file (or the path) |
output_name | Name of the output file (or the path) |
task | Integer between 1 and 4 |
cost_vector | List (or tuple) of four integers. Default: (-1,1,1,1) |
verbose | If set to True, print a message when the computation starts and ends. Default: False |
maximum | Maximum number of solutions to be outputted. Default: Infinity |
acyclic_only | If set to True and task=1, output only acyclic solutions. Default: False |
The function does not return anything. It writes the output to the specified file, depending on the chosen task (see Definitions of tasks).
The output format is the same as in the GUI version, except for task 2: it only enumerates the event vectors, not one reconciliation per event vector.
For tasks 3 and 4, the output format corresponds to the option Labels only in the GUI version (see Options for tasks T3 and T4).
Log file¶
With the option verbose, only simple messages are printed to the console, at the start and the end of computation.
More detailed information (parameters used, time of computation, cause of error) can be found in the log file capybara.log in the working directory.
After a successful job, the user can find messages similar to the following at the end of the log file:
2020-04-19 21:50:05,131 - 448c175a865711ea8496b8ca3aced137 - INFO - ===== Job started! =====
2020-04-19 21:50:05,131 - 448c175a865711ea8496b8ca3aced137 - INFO - Running Capybara Counter Task 2
2020-04-19 21:50:05,132 - 448c175a865711ea8496b8ca3aced137 - INFO - Input file: /home/user/cophy/data/input1.nex
2020-04-19 21:50:05,132 - 448c175a865711ea8496b8ca3aced137 - INFO - Cost vector: (0, 1, 1, 1)
2020-04-19 21:50:05,133 - 448c175a865711ea8496b8ca3aced137 - INFO - Reading the input file...
2020-04-19 21:50:05,134 - 448c175a865711ea8496b8ca3aced137 - INFO - Successful! Computing...
2020-04-19 21:50:05,162 - 448c175a865711ea8496b8ca3aced137 - INFO - Done! The result of Counter Task 2 is 8
2020-04-19 21:50:05,166 - 448c175a865711ea8496b8ca3aced137 - INFO - ===== Job finished successfully! =====
A job interrupted by the user (for example, by pressing Ctrl+C) is also shown in the log file:
2020-04-19 22:04:11,600 - 448c175a865711ea8496b8ca3aced137 - INFO - ===== Job started! =====
2020-04-19 22:04:11,600 - 448c175a865711ea8496b8ca3aced137 - INFO - Running Capybara Enumerator Task 4
2020-04-19 22:04:11,601 - 448c175a865711ea8496b8ca3aced137 - INFO - Input file: /home/user/cophy/data/input1.nex
2020-04-19 22:04:11,601 - 448c175a865711ea8496b8ca3aced137 - INFO - Cost vector: (0, 1, 1, 1)
2020-04-19 22:04:11,601 - 448c175a865711ea8496b8ca3aced137 - INFO - Output file: /home/user/cophy/input1-classes.txt
2020-04-19 22:04:11,601 - 448c175a865711ea8496b8ca3aced137 - INFO - Reading the input file...
2020-04-19 22:04:11,602 - 448c175a865711ea8496b8ca3aced137 - INFO - Successful! Computing...
2020-04-19 22:04:11,604 - 448c175a865711ea8496b8ca3aced137 - WARNING - Keyboard interrupt
2020-04-19 22:04:11,604 - 448c175a865711ea8496b8ca3aced137 - INFO - ===== Job aborted! =====
The log file is automatically cleaned after 48 hours (the old file is renamed as capybara.log.CREATION-DATE).
Examples¶
A simple example¶
Here is a simple script where we print the number of event vectors, for different cost vectors:
import capybara.counter
for cost_vector in [(-1,1,1,1), (0,1,1,1), (0,1,1,0), (0,2,3,1)]:
num_vectors = capybara.counter.run('input.nex', task=2, cost_vector=cost_vector)
print("Cost vector =", cost_vector)
print("Number of event vectors =", num_vectors)
Multiprocessing on multiple input files¶
The user may want to use more than one CPU unit when working with a large number of files.
We propose below a script that uses multiprocessing. It follows three steps:
- Define a function that applies
capybara.enumerator.run
with parameters task=4 and cost_vector=(0,1,1,1) on the given input file. The output file name is obtained by adding a suffix “0111_classes” to the input file name. - Get the list of all input files, located in the directory data/.
- Create a pool of 4 worker processes, and apply the function on the list of input files.
import os
from multiprocessing import Pool
import capybara.enumerator
# define my custom function
def my_function(input_name):
output_name = '{name}_{suffix}.txt'.format(name=os.path.splitext(input_name)[0],
suffix='0111_classes')
capybara.enumerator.run(input_name, output_name, task=4, cost_vector=(0,1,1,1))
# getting all .nex files in the data folder
nexfiles = [os.path.join('data', fname) for fname in os.listdir('data')
if fname.endswith('.nex')]
# apply the function to the list of files
with Pool(processes=4) as pool:
pool.map(my_function, nexfiles)