In my last post, I simulated Brownian motion (BM) evolution on a simple 5-taxon tree in which I retained the states at internal nodes. Since some people might be interested in retaining ancestral character values during BM simulation, I will write a blog post on how to do this (actually, there are various ways).
Before doing so, I thought it would be fun to compare the different procedures (in terms of computation time) that we can use to conduct BM simulation on a tree in R (focusing only on the tip values for now).
First, we need to have a way to quantify computation time. This is easy in R using the {base} function
system.time(). system.time(
function), for valid R function
"function," calls
proc.time() once at the beginning of the execution of
function and then a second time at the end, and then calculates the difference between these two values.
proc.time() returns the current elapsed real and CPU (in seconds) of our current R session, so the difference between two calls to this function gives us the elapsed time of the system.
To get a sense of this, we can call
system.time() on a simple R function, and then a more complicated one. On my super-slow netbook, this would look as follows:
> system.time(four<-2+2)
user system elapsed
0 0 0
> system.time(tree<-birthdeath.tree(b=1,d=0,taxa.stop=200))
user system elapsed
0.40 0.00 0.64The number under "elapsed" is the relevant value here.
Even though we called the {geiger} function
birthdeath.tree() (which grows Yule trees under a user specified speciation/extinction model) from within
system.time(), it still assigns our simulated phylogeny to "tree" in this case. Here, we see that adding 2+2 takes essentially no time; whereas simulating a 100 species tree takes a little more than half a second. Fun.
Ok, now I'm going to propose three different ways to simulate BM on our 200 species tree. We focus on getting the tip values for species here - later, I will show how to get ancestral character values as well. None of these methods are particularly fast; we can do much better and this will be the subject of a future post.
I'm going to simulate using three different approaches: 1) I will use the {geiger} function
sim.char(); 2) I will use the {MASS} function
mvrnorm(); and, finally, 3) I will use the {base} function
chol() (for Cholesky decomposition) and the {stats} function
rnorm() (for random number generation).
Ok, to simulate with
sim.char() is straightforward.
sim.char() is a really neat function that can actually simulate under a variety of models - including a discrete character model. BM is the default, so to call (and time)
sim.char(), we should just do this:
> system.time(x<-sim.char(phy=tree,model.matrix=matrix(1,1,1)))
user system elapsed
11.06 0.00 12.22This shows that (on my computer) simulating the evolution of a single character on a 200 taxon stochastic phylogeny takes about 12 seconds.
We can also use the function
mvrnorm().
mvrnorm() simulates a data vector from a multivariate normal distribution. For BM on a phylogeny, we should just do the following:
> system.time(x<-mvrnorm(Sigma=1*vcv(tree),
mu=rep(0,length(tree$tip))))
user system elapsed
3.59 0.00 4.29Even though
sim.char() uses
mvrnorm(), the latter is
much faster at simulating BM evolution. (This is because some of the internal calculations of
sim.char() turn out to be quite slow.)
Finally, we can use the Cholesky decomposite matrix and multiply by a random normal vector to achieve the same effect.
> system.time(x<-t(chol(vcv(tree)))%*%
rnorm(length(tree$tip),sd=1))
user system elapsed
3.77 0.00 3.92This is slightly faster than
mvrnorm(), but almost indistinguishable. The difference between
chol() and
mvrnorm() is that the latter uses to singular value decomposition to factorize the
vcv(phy) matrix. Cholesky decomposition is faster but less stable if
vcv(phy) is ill-conditioned. Ill-conditioned
vcv(phy) under BM will be the case if we have very short terminal branches in the tree (which results in the lowest rank eigenvalue of
vcv(phy) going to zero). To avoid this in simulation we should do:
> tree<-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=201),"201")This is because
birthdeath.tree() stops at the most recent speciation event, leaving a divergence depth of 0.0 between the newest tips on the tree if the "taxa.stop" criterion is used.
More to come on this later. . . .