5. Performing analyses¶

ivy has many tools for performing analyses on trees. Here we will cover a few analyses you can perform.

5.1. Phylogenetically Independent Contrasts¶

You can perform PICs using ivy‘s PIC function. This function takes a root node and a dictionary mapping node labels to character traits as inputs and outputs a dictionary mappinginternal nodes to tuples containing ancestral state, its variance (error), the contrast, and the contrasts’s variance.

The following example uses a consensus tree from Sarkinen et al. 2013 and Ziegler et al. unpub. data.

Note: This function requires that the root node have a length that is not none. Note: this function currently cannot handle polytomies.

In [1]: import ivy
In [2]: import csv
In [3]: import matplotlib.pyplot as plt
In [4]: r = ivy.tree.read("examples/solanaceae_sarkinen2013.newick")
In [5]: r.length = 0
In [6]: polvol = {}; stylen = {}
In [7]: with open("examples/pollenvolume_stylelength.csv", "r") as csvfile:
            traits = csv.DictReader(csvfile, delimiter = ",", quotechar = '"')
            for i in traits:
                polvol[i["Binomial"]] = float(i["PollenVolume"])
                stylen[i["Binomial"]] = float(i["StyleLength"])

In [8]: p = ivy.contrasts.PIC(r, polvol) # Contrasts for log-transformed pollen volume
In [9]: s = ivy.contrasts.PIC(r, stylen) # Contrasts for log-transformed style length
In [10]: pcons, scons = zip(*[ [p[key][2], s[key][2]] for key in p.keys() ])
In [11]: plt.scatter(scons,pcons)
In [12]: plt.show()

5.2. Lineages Through Time¶

ivy has functions for computing LTTs. The ltt function takes a root node as input and returns a tuple of 1D-arrays containing the results of times and diverisities.

Note: The tree is expected to be an ultrametric chromogram (extant leaves, branch lengths proportional to time).

In [13]: import ivy
In [14]: r = ivy.tree.read("examples/solanaceae_sarkinen2013.newick")
In [15]: v = ivy.ltt(r)

You can plot your results using matplotlib.

In [16]: import matplotlib.pyplot as plt
In [17]: plt.step(v[0], v[1])
In [18]: plt.xlabel("Time"); plt.ylabel("Lineages"); plt.title("LTT")
In [19]: plt.show()

5.3. Phylorate plot¶

By accessing R libraries using rpy2, we can use the functions in the BAMMtools R library to generate phylorate plots.

The following analysis is done using the whales dataset provided with BAMMtools.

In [20]: from ivy.r_funcs import phylorate
In [21]: e = "whaleEvents.csv" # Event data created with BAMM
In [22]: treefile = "whales.tre"
In [23]: rates = phylorate(e, treefile, "netdiv")

We can add the results as a layer to a plot using the add_phylorate function in ivy.vis.layers

In [24]: from ivy.vis import layers
In [25]: r = readtree(treefile)
In [26]: fig = treefig(r)
In [27]: fig.add_layer(layers.add_phylorate, rates[0], rates[1], ov=False,
       store="netdiv")

5.4. Mk models¶

ivy has functions to fit an Mk model given a tree and a list of character states. There are functions to fit the Mk model using both maximum likelihood and Bayesian MCMC methods.

To fit an Mk model, you need a tree and a list of character states. This list should only contain integers 0,1,2,...,N, with each integer corresponding to a state. The list of characters should be provided in preorder sequence.

Let’s read in some example data: plant habit in tobacco. We can load in a csv containing binomials and character states using the loadChars function. This gives us a dictionary mapping binomials to character states.

In [28]: tree = ivy.tree.read("examples/nicotiana.newick")
In [29]: chars = ivy.tree.load_chars("examples/nicotianaHabit.csv")

Let’s get our data into the correct format: we need to convert chars into a list of 0’s and 1’s matching the character states in preorder sequence

In [30]: charsPreorder = [ chars[n.label]["Habit"] for n in tree.leaves() ]
In [31]: charList = map(lambda x: 0 if x=="Herb" else 1, charsPreorder)

We can take a look at how the states are distributed on the tree using the tip_chars method on a tree figure object. In this case “Herb” will be represented by green and “Shrub” will be represented by brown.

In [32]: fig = ivy.vis.treevis.TreeFigure(tree)
In [33]: fig.tip_chars(charList, colors=["green", "brown"])

Now we are ready to fit the model. We will go over the maximum likelihood approach first.

5.4.1. Maximum Likelihood¶

Perhaps the simplest way to fit an Mk model is with the maximum likelihood approach. We will make use of the optimize module of scipy to find the maximum likelihood values of this model.

First, we must consider what type of model to fit. ivy allows you to specify what kind of instantaneous rate matrix (Q matrix) to use. The options are:

“ER”: An equal-rates Q matrix has only one parameter: the forward and backswards rates for all characters are all equal.
“Sym”: A symmetrical rates Q matrix forces forward and reverse rates to be equal, but allows rates for different characters to differ. It has a number of parameters equal to (N^2 - N)/2, where N is the number of character states.
“ARD”: An all-rates different Q matrix allows all rates to vary freely. It has a number of parameters equal to (N^2 - N).

In this case, we will fit an ARD Q matrix.

We also need to specify how the prior at the root is handled. There are a few ways to handle weighting the likelihood at the root:

“Equal”: When the likelihood at the root is set to equal, no weighting is given to the root being in any particular state. All likelihoods for all states are given equal weighting
“Equilibrium”: This setting causes the likelihoods at the root to be weighted by the stationary distribution of the Q matrix, as is described in Maddison et al 2007.
“Fitzjohn”: This setting causes the likelihoods at the root to be weighted by how well each root state would explain the data at the tips, as is described in Fitzjohn 2009.

In this case we will use the “Fitzjohn” method.

We can use the fitMk method with these settings to fit the model. This function returns a dict containing the fitted Q matrix, the log-likelihood, and the weighting at the root node.

In [34]: from ivy.chars import mk
In [35]: mk_results = mk.fit_Mk(tree, charList, Q="ARD", pi="Fitzjohn")
In [36]: print mk_results["Q"]
[[-0.01246449  0.01246449]
 [ 0.09898439 -0.09898439]]
In [37]: print mk_results["Log-likelihood"]
-11.3009106093
In [38]: print mk_results["pi"]
{0: 0.088591248260230959, 1: 0.9114087517397691}

Let’s take a look at the results

In [39]: print mk_results["Q"]
[[-0.01246449  0.01246449]
 [ 0.09898439 -0.09898439]]

First is the Q matrix. The fitted Q matrix shows the transition rate from i->j, where i is the row and j is the column. Recall that in this dataset, character 0 corresponds to herbacious and 1 to woody. We can see that the transition rate from woody to herbacious is higher than the transition from herbacious to woody.

In [40]: print mk_results["Log-likelihood"]
-11.3009106093

Next is the log-likelihood. This is the log-likelihood of the data using the fitted model

In [41]: print mk_results["pi"]
{0: 0.088591248260230959, 1: 0.9114087517397691}

Finally we have pi, the weighting at the root. We can see that there is higher weighting for the root being in state 1 (woody).

5.4.2. Bayesian¶

ivy has a framework in place for using pymc to sample from a Bayesian Mk model. The process of fitting a Bayesian Mk model is very similar to fitting a maximum likelihood model.

The module bayesian_models has a function create_mk_model that takes the same input as fitMk and creates a pymc model that can be sampled with an MCMC chain

First we create the model.

In [42]: from ivy.chars import bayesian_models
In [43]: from ivy.chars.bayesian_models import create_mk_model
In [44]: mk_mod = create_mk_model(tree, charList, Qtype="ARD", pi="Fitzjohn")

Now that we have the model, we can use pymc syntax to set up an MCMC chain.

In [45]: import pymc
In [46]: mk_mcmc = pymc.MCMC(mk_mod)
In [47]: mk_mcmc.sample(4000, burn=200, thin = 2)
[-----------------100%-----------------] 2000 of 2000 complete in 4.7 sec

We can access the results using the trace method of the mcmc object and giving it the name of the parameter we want. In this case, we want “Qparams”

In [48]: mk_mcmc.trace("Qparams")[:]
array([[ 0.01756608,  0.07222648],
   [ 0.03266443,  0.05712813],
   [ 0.03266443,  0.05712813],
   ...,
   [ 0.01170189,  0.03909211],
   [ 0.01170189,  0.03909211],
   [ 0.00989616,  0.03305975]])

Each element of the trace is an array containing the two fitted Q parameters. Let’s get the 5%, 50%, and 95% percentiles for both parameters

In [49]: import numpy as np
In [50]: Q01 = [ i[0] for i in mk_mcmc.trace("Qparams")[:] ]
In [51]: Q10 = [ i[1] for i in mk_mcmc.trace("Qparams")[:] ]
In [52]: np.percentile(Q01, [5,50,95])
Out[52]: array([ 0.00308076,  0.01844342,  0.06290078])
In [53]: np.percentile(Q10, [5,50,95])
Out[53]: array([ 0.03294584,  0.09525662,  0.21803742])

Unsurprisingly, the results are similar to the ones we got from the maximum likelihood analysis

5.5. Hidden-Rates Models¶

ivy has functions for fitting hidden-rates Markov models (HRM) (see Beaulieu et al. 2013), as well as for performing ancestor-state reconstructions and visualizations of hidden-rates models.

We will demonstrate these functions using an example dataset, fitting a two-state character with two different hidden rates or “regimes”. Let’s set up our data.

In [54]: import ivy
In [55]: from ivy.chars import hrm
In [56]: tree = ivy.tree.read("hrm_600tips.newick")
In [57]: chars = [0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1,
                0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0,
                0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1,
                1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,
                0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1,
                1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
                0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0,
                0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
                1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
                1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0,
                0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1,
                1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0,
                0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1,
                1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0]

5.5.1. Maximum Likelihood¶

ivy can fit a maximum likelihood HRM model using the fit_hrm function. Here we will explain the options and uses of this function.

Like the Mk functions, ivy‘s HRM functions take as input a tree and a list of the characters in preorder sequence. These character states are already in order.

The simplest way to fit an HRM model is to use the fit_hrm function. Like the fitMk function, there are a few possible models we can fit:

“Simple”: Under the simple model, each regime is equivalent to an equal-rates Mk model. Transitions between regimes are all constrained to be equal. This model fits M+1 parameters where M is the number of regimes.
“ARD”: Under this model, all rates are allowed to vary freely. This model fits (N^2 - N)/(2/M) + (M^2-M)*N parameters, where N is the number of character states and M is the number of regimes.

Here we will fit a Simple model.

In [58]: fit = hrm.fit_hrm(tree, chars, nregime=2, Qtype="Simple", pi="Equal")
In [59]: fit
Out[59]:
{'Log-likelihood':-204.52351825389133,
'Q': (array([[-0.04596664,  0.03291299,  0.01305365,  0.        ],
        [ 0.03291299, -0.04596664,  0.        ,  0.01305365],
        [ 0.01305365,  0.        , -0.44655655,  0.4335029 ],
        [ 0.        ,  0.01305365,  0.4335029 , -0.44655655]]),
'rootLiks': {0: 0.25, 1: 0.25, 2: 0.25, 3: 0.25})

The output of the fit_hrm function is a dictionary containing the fitted Q matrix, the log-likelihood, and the prior weighting at the root (all values are equal in this case because we set pi to be “Equal”)

Now that we have our reconstructed Q matrix, we can perform ancestral state reconstruction. We will use the anc_recon_discrete function. This function can handle both single-regime Mk models and multi-regime HRM models. This functions takes a tree, the character states at the tips, and the Q matrix to be used in the reconstruction. We will use our fitted Q matrix as input. We also provide the prior at the root (in this case, “Equal”). Because this is a hidden-rates model, we also provide the number of regimes (in this case, 2)

In [60]: from ivy.chars import anc_recon
In [61]: recon = anc_recon.anc_recon_discrete(tree, chars, fit[0],
                                             pi="Equal", nregime=2)

The output of this function is an array containing the likelihoods of each node being in each state. We can use the output of this function to visualize the reconstructed states on the tree.

In [62]: from ivy.interactive import *
In [63]: from ivy.vis.layers import add_ancrecon_hrm
In [64]: fig = treefig(tree)
In [65]: fig.add_layer(add_ancrecon_hrm, recon)

Green corresponds to state 0 and blue to state 1. The more saturated colors correspond to the faster regime and the duller colors to the slower regime.

5.6. BAMM-like Mk model¶

ivy has code for fitting a BAMM-like Mk model (also referred to here as a multi-mk model) to a tree, where different sections of the tree have distinct Mk models associated with them.

We will demonstrate fitting a two-regime BAMM-like Mk model in a Bayesian context using pymc

In [66]: import ivy
In [67]: from ivy.chars import mk_mr
In [68]: from ivy import examples
In [69]: tree = examples.simtree_600
In [70]: chars = examples.simchar_600

In [71]: data = dict(zip([n.label for n in tree.leaves()],chars))

First, we will use the ivy function mk_multi_bayes to create the pymc obejct. We will specify the model using a special array, the qidx array. The qidx array provides information for fitting parameters into a Q matrix.

The qidx for a multi-mk model is a two-dimensional numpy array with four columns. The first column refers to regime, the second column refers to the row index of the Q matrix, and the third column refers to the column index of the Q matrix. The fourth row refers to the parameter identity that will be filled into this location.

TODO: link to mk_multi_bayes documentation

In this case, we will fit a model where there are two equal-rates mk models somewhere on the tree, each with two different rates. The qidx will be as follows:

In [72]: import numpy as np
In [73]: qidx = np.array([[0,0,1,0],
                         [0,1,0,0],
                         [1,0,1,1],
                         [1,1,0,1]])
In [74]: my_model = mk_mr.mk_multi_bayes(tree, data,nregime=2,qidx=qidx)

Now we will sample from our model. We will do 100,000 samples with a burn-in of 10,000 and a thinning factor of 3.

In [75] my_model.sample(100000,burn=10000,thin=3)

Now we can look at the output. Seeing the fitted parameters is easy.

In [76]: print(np.mean(my_model.trace("Qparam_0")[:])) # Q param 0
In [77]: print(np.mean(my_model.trace("Qparam_1")[:])) # Q param 1

Seeing the location of the fitted switchpoints is a little trickier. We can use the plotting layer add_tree_heatmap to see where the switchpoint was reconstructed at.

In [78]: from ivy.interactive import *
In [79]: fig = treefig(tree)
In [80]: switchpoint = my_model.trace("switch_0")[:]
In [81]: fig.add_layer(ivy.vis.layers.add_tree_heatmap,switchpoint)

5.7. Classe¶

The ClaSSE model is implemented in ivy. Using ivy‘s ClaSSE implementation, you can fit ClaSSE, along with all of its derivatives like GeoSSE, to your data.

When you fit a ClaSSE model, you get an output dictionary containing the log-likelihood and all fitted parameters.

The output is a dictionary containing the maximum log-likelihood and a dictionary containing the parameters.

The structure of the parameter dictionary is as follows: the dictionary contains three arrays; “lambda”, “mu”, and “q”.

The lambda array is a three-dimensional array that corresponds to the speciation rates. The first dimension corresponds to the parent’s state, the second to the first child’s state, and the third to the second child’s state. Thus, lambda[0,0,1] corresponds to the speciation rate where the parent is in state 0, the first child is in state 0, and the second child is in state 1. As in diversitree, any lambda values where the first child has a higher state than the second child (e.g. lambda[0,1,0]) is set to 0.

The q array is a two-dimensional array that corresponds to the rates of character change. The first dimension corresponds to the rootward state and the second dimension corresponds to the tipward state. q[0,1] corresponds to the transition rate from 0 to 1. The diagonals are set to zero.

The mu array is a one-dimensional array. It corresponds to the extinction rate of each state. mu[0] corresponds to the extinction rate of state 0.

We will demonstrate fitting a fully-parameterized 2-state ClaSSE model to some data, then fitting a 3-state ClaSSE model that has been parameterized to be identical to GeoSSE.

In [82]: import ivy
In [83]: from ivy.chars.sse import sse
In [84]: from ivy import examples
In [85]: root = examples.simtree_600 # This is an ivy Node object
In [86]: data = examples.simdata_600 # This is a dictionary mapping tip labels to character states

5.7.1. Maximum likelihood ClaSSE¶

Fitting a model is relatively straightforward with fit_classe. This implementation is reasonably fast but still may take a little while.

In [87]: mle_classe = sse.fit_classe(root,data)
In [88]: print(mle_classe["lnl"])
-1287.64633207
In [89]: print(mle_classe["fit"])
{'mu': array([  1.23669910e-06,   3.76398510e-05]),
 'q': array([[ 0.        ,  0.05008601],
            [ 0.08421281,  0.        ]]),
 'lambda': array([[[  4.18641344e-01,   1.48737218e-08],
                  [  0.00000000e+00,   4.33500041e-03]],

                 [[  2.98016896e-08,   0.00000000e+00],
                  [  0.00000000e+00,   5.08865803e-01]]])}

5.7.2. Maximum likelihood GeoSSE¶

ivy‘s implementation of ClaSSE allows for flexible model specification. You can extend ClaSSE through the use of the pidx parameter. Here we will demonstrate how you can constrain a ClaSSE model in ivy to be a GeoSSE model.

The pidx parameter is a dictionary of arrays. It is similar to the parameter output of fit_classe. Each cell of each array corresponds to a parameter of the model. With the exception of 0, each integer in the array corresponds to a unique parameter.

The function will fit an array of parameters to this model. The parameters are stored in a one-dimensional array that is indexed by the pidx arrays. The values within the pidx arrays are integers that index the parameter array. In this case, both lambda[0,0,1] and lambda[1,1,1] share the integer 1, meaning they both refer to the first item of the parameter array and therefore will be identical.

The order of the integers has no meaning; the parameter denoted by index 2 is not enforced to be greater than the parameter denoted by 1. Any cell that has a 0 in it is constrained to be 0. Any parameters not used by the model (e.g. lambda[0,1,0] or q[1,1]) are ignored.

In [90]: import numpy as np
In [91]: lambdaidx = np.array([[[0,1,2],
                               [0,0,3],
                               [0,0,0]],
                              [[0,0,0],
                               [0,1,0],
                               [0,0,0]],
                              [[0,0,0],
                               [0,0,0],
                               [0,0,2]]])
In [92]: muidx = np.array([0,4,5])
In [93]: qidx = np.array([[0,5,4],
                         [6,0,0],
                         [7,0,0]])
In [94]: pidx = {"lambda":lambdaidx,"mu":muidx,"q":qidx}

Now that we have our pidx array, we can use it in fit_classe. This is an optional parameter: if it is not given, the function defaults to a fully-parameterized “all-rates-different” model.

In [95]: mle_geosse = sse.fit_classe(root,data,nstate=3,pidx=pidx)

5.7.3. Creating the pidx array¶

When the number of states is large, the number of lambda parameters gets to be very high. It can be difficult to manually create the lambdaidx array. Fortunately, ivy has a convenience function for creating the lambda array.

Using the make_blank_lambda function, you can create a lambda array filled with a single “background” parameter. You can then manually fill in the parameters that are different from the background parameter. If we wanted to create the GeoSSE lambda array, we could do it like this:

In [96]: lambdaidx = sse.make_blank_lambda(k=3,fill=0) # Fill can be any integer
In [97]: lambdaidx[0,0,1] = lambdaidx[1,1,1] = 1
In [98]: lambdaidx[0,0,2] = lambdaidx[2,2,2] = 2
In [99]: lambdaidx[0,1,2] = 3
In [100]: lambdaidx
Out[100]:
array([[[0, 1, 2],
        [0, 0, 3],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 1, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 2]]])

5.7.4. Creating a likelihood function¶

If instead of finding the best-fit values for a ClaSSE model, you want to create a likelihood function to find the likelihood values of parameters directly, or to use in a Bayesian context, you can use the make_classe function.

In [101]: f = sse.make_classe(root,data)

Create a classe likelihood function to be optimized with nlopt or used
in a Bayesian analysis

Here is a full explanation of the fit_classe function:

Args:
    root (Node): Root of tree to perform the analysis on. MUST BE BIFURCATING
    data (dict): Dictionary mapping node tip labels to character states.
      Character states must be represented by ints starting at index 0.
    nstate (int): Number of states. Can be more than the number of
      states indicated by data. Optional, defaults to number of states in data.
    pidx (dict): Dictionary of parameter indices. Each integer in the
      array, with the exception of 0, corresponds to a unique paramter
      in the parameter array that is given to the likelihood function.
      0 indicates that the value for that parameter is 0.0.
      Any indices in the array that are not "legal" (for example, lambda value 010 or
      q value 00) are ignored.

      Optional. If not given, defaults to an "all-rates-different"
      configuration.
      Dictionary items are:
      "lambda": k x k x k array of indices indicating lambda values.
                The first dimension corresponds to the parent state,
                the second to the first child's state, and the third
                to the second child's state.
                Example:
                np.array([[[1,2],
                           [0,3]],
                          [[4,5],
                           [0,6]]])
      "mu": k-length 1-dimensional array containing mu indices.
            Example:
            np.array([7,8])
      "q": k x k array of indices indicating q values. The first
           dimension corresponds to the rootward state and the second
           dimension corresponds to the tipward state.
           Examaple:
           np.array([[0,9],
                     [10,0]])
    condition_on_surv (bool): Whether to condition the likelihood on the
      survival of the clades and speciation of subtending clade. See Nee et al. 1994.
    pi (str): Behavior at the root. Defaults to "Equal". Possible values:
      Equal: Flat prior weighting all states equally
      Fitzjohn: Weight states by relative probability of observing data (Fitzjohn et al 2009)
      Given: Weight by given pi values.
    pi_given (np.array): If pi = "Given", use this as the weighting at the root.
      Must sum to 1.
    startingvals (np.array): Starting values for the optimization. Optional, defaults to 0.01 for all values.
Returns:
    function: Function that takes an array of parameters as input.