Brownie manual

Introduction

Brownie performs many kinds of analyses, from estimating rates of character evolution to species delimitation and species tree inference to estimating ancestral states under various models of evolution. It is open source (code available at http://code.google.com/p/brownie/) and compiles on Mac OS X, linux, and windows (under Cygwin).

The program itself has basic help. Type help for a list of commands and the command name followed by a space and then question mark for information on that command (syntax similar to that of PAUP and MrBayes).

Testing continuous rates (original censored approach)

The original purpose of Brownie was to calculate and compare rates of continuous character evolution in different parts of a tree. These parts may be clades or a mixture of clades and paraphyletic groups. In a typical use, a researcher might have a discrete character that changes somewhere on the tree (such as origin of a new trait or invasion of a new habitat) and a hypothesis that this will change the rate of evolution of some other continuous character (leg length, genome size, etc.). The tree (or trees) is loaded into Brownie, as is the continuous character, in a Nexus format file (the same format as is used by PAUP*, MrBayes, MacClade, Mesquite, etc.). The groups to be tested (i.e., the clade with the new trait and the paraphyletic group without it) are identified as taxsets, and the program computes various statistics to let the user know whether fitting all the groups with the same rate is substantially worse than allowing each group to have its own rate.

“Substantially worse” is a deliberately ambiguous term. Brownie can be used in both a hypothesis-testing approach and in a model-selection approach. In the former case, the user would use the p-values output by Brownie to reject/not reject the hypothesis that the rate of evolution is the same in all the groups. Brownie outputs two p-values. The chi-square p-value uses a likelihood ratio test to compare the likelihood of the model with multiple rate parameters with the model with all rates constrained to be the same, using a chi-square distribution to evaluate significance. Use of the chi-square is asymptotically correct, but may be non-conservative (improperly suggest significance) in the case of few taxa. For this reason, Brownie also assesses significance using parametric bootstrapping (simulation under the null hypothesis). This is the second p-value. Computing this p-value is time-consuming. By default, Brownie does not do parametric bootstrapping, but this option is available and should be used for the hypothesis-testing approach if at least one of the taxsets examined has few taxa.

Under the model-selection approach, the goal is to find a model that adequately describes the data, not to reject a hypothesis. This is accomplished through the use of the Akaike Information Criterion (AIC), which is just the likelihood with an additional penalty term for the number of free parameters. The AICc is a modification of AIC to be appropriate with small sample sizes (when the number of data points is less than forty times the number of free parameters (Burnham & Anderson, 2002)). Smaller AIC/AICc values indicate a better-fitting model. There is no p-value in this approach; rather, the difference in AIC/AICc value indicates how much better one model is than the other. According to Burnham and Anderson (2002), a difference of less than two indicates that the worse model still has substantial support, a difference of four to seven indicates “considerably less” support for the worse model, and a difference of over ten indicates essentially no support for the worse model.

Brownie outputs all four values by default. The only one the user can turn off is the parametric bootstrap p-value: calculation of this value can take quite some time, so the user can set the number of replicates to zero to avoid this cost. However, if the user is using a hypothesis-testing approach, this just leaves her or him with the potentially non-conservative (i.e., bad) chi-square p-value, instead. For all the values, Brownie also gives an indication of strength of support: a means weak support (or at least, non-rejection) of the single-rate model, b means weak support for the multiple-rate model, and B indicates strong support for the multiple-rate model. Other output is the likelihood of each model, the rate parameters (s2) under each model, and the likeliest ancestral state value for each group.

Testing continuous rates (non-censored approach)

This was described in the original O'Meara et al. (2006) paper and implemented in r8s. I have now added it in this version of Brownie. With the non-censored approach, where on a branch the rate change happens matters to the outcome. There isn't an easy way to specify this in NEXUS files. The only program doing something similar was SIMMAP's output for traced character histories, so I have adapted Brownie to use this format. This allows rate parameters to change within a branch as well as at nodes. It has the side benefit of allowing stochastic character mapping of discrete traits to be used in the assigning of continuous traits, as described in my dissertation O'Meara (2008) (available and citable from Nature Precedings), a forthcoming paper by Collar et al., and another paper shortly to be submitted.

To do the non-censored approach, the model has to be specified using the "model" command. A single rate model is just called "BM1" -- Brownian motion, 1 rate. A multiple rate model, where there is a different rate parameter for different painted branches or subbranches, is "BMS" -- Brownian motion with rate varying by state of the stochastically- or other mapped character. "BMC" allows a different rate parameter on branches where the trait changes state. "OUSM" and "OUCM" are the Ornstein-Uhlenbeck equivalents of the BMS and BMC models, but with the mean allowed to change but not the Brownian rate parameter. In an Ornstein-Uhlenbeck process, traits are pulled towards a central value while still being allowed to wiggle. The Ornstein-Uhlenbeck model used in OUCH (Butler and King, 2004) is essentially Brownie's OUSM model with mean changes happening only at nodes. Brownie and OUCH results should agree. One can imagine more complex OU processes that allow the Brownian rate parameter and attraction parameter as well as the OU mean parameter to change over the tree -- such models are not yet publicly available in Brownie, but work is ongoing for some of them. Note that all these non-censored models are very new in the program, and there may yet be issues. For example, currently estimates come with a "+/-" reported. This just relates to the effectiveness of the numerical optimization, NOT A CONFIDENCE INTERVAL ON THE ESTIMATES.

Once the model is set, an analysis is run by using the "continuous" command. The are various settings for looping over trees and characters, limiting the search to only certain taxa (specified with a taxset command), and saving output to a file. Type "continuous ?" for help. There are advanced options available from the "numopt" commands regarding the numerical optimization approach used (a Nelder-Mead simplex algorithm implemented as part of the GNU Scientific Library). The current settings seem to generally work, but you may play with the settings if the program is running too slowly or is getting different answers between runs (it generally shouldn't).

Discrete character evolution

Brownie can now do discrete character optimization. NOTE THAT THIS IS RELATIVELY UNTESTED, AND YOU SHOULD NOT PUBLISH WITH THIS WITHOUT CHECKING WITH BRIAN O'MEARA FIRST. Basically, the design is for very flexible analyses. Rates can be any combination: general time nonreversible, general time reversible, or constrained in various ways (i.e., rate 0->2 = rate 2->3, or rate 1->4=0.05). State frequencies at the root can be the empirical distribution at the tips, equal probabilities for all states, optimized for the model, the equilibrium state frequencies, or user-set state frequencies. There's a correction for only including variable traits that comes from Felsenstein (1992) and Lewis (2001), the ability to estimate parameters over many characters at once, the ability to assume characters missing some state still have that state a possibility, and more. It's also possible to reconstruct states within edges as well as at nodes. See my dissertation O'Meara (2008) (available and citable from Nature Precedings) and another paper shortly to be submitted for more info. Doing the reconstruction of a discrete trait within the tree allows the non-censored continuous approaches to be used on these branches (as does loading SIMMAP-derived trees). Note that to load both continuous and discrete traits, one of the blocks must be called characters2 rather than characters in the nexus file (a workaround for a limit in the version of the NCL used in Brownie).

Gene loss

The gene loss model is a particular instance of a discrete character model and was the first one coded. The citation for it is McBride CS, J.R. Arguello, and B.C O'Meara. 2007. Five Drosophila genomes reveal nonneutral evolution and the signature of host specialization in the chemoreceptor superfamily. Genetics 177(3):1395-416. McBride & Arguello wrote the main article; I had an authored appendix, where the model is described. It fits a discrete trait model where the root probability is forced to be state 1 (presence), and there is only a loss only model. Two models are fit: one with one loss rate over the whole tree, and one with different parameters on parts of the tree assigned to rates 0 and 1.

Species delimitation and species tree inference

THE MANUSCRIPT DESCRIBING THIS HAS BEEN ACCEPTED PENDING MAJOR REVISION IN SYSTEMATIC BIOLOGY, SO THIS IS NOT AVAILABLE FOR WIDESPREAD USE UNTIL THE PAPER IS IN PRESS (reviewers suggest useful changes to the approach). Basically, Brownie can take a set of gene trees labeled just by specimen (sample1, yellowstonesampleA, etc.) and infer both what the species are (samples A, B, and D form one species, samples C, E, F, and G form another) and the tree relating these species. An earlier version of this approach is described in my dissertation O'Meara (2008) (available and citable from Nature Precedings). It doesn't need reference species (it's not assigning samples to known species) or gene trees with branch lengths. It works better than I expected, and with just a handful of loci and a few (>2) samples per species.

Input trees

Like most comparative methods, Brownie requires a tree with branch lengths. In general, these branch lengths should be proportional to time. Programs like r8s (Sanderson, 1997, 2002) or multidivtime (Thorne et al 1998), or even PAUP* (Swofford 2003) or Phylip (Felsenstein 1993) if the clock model is used, can generate trees with such branch lengths. In some cases, other scalings, such as branch lengths in units of number of generations or number of molecular substitutions, may be appropriate. It is generally not appropriate to invent branch lengths, such as assigning all branches to have length 1 or using Grafen branch lengths.

Branch length estimates are notoriously imprecise. Moreover, we may not even be that confident about the topology. In order to account for uncertainty in the tree, Brownie allows users to input a set of trees with weights, and can use the weights in summarizing results at the end of an analysis. For example, the input could be a set of bootstrap trees with their weights or a set of trees from a MrBayes run with each tree’s posterior probability (though note the requirement for trees with branch lengths — save bootstrap trees and Bayes trees with branch lengths during the search). Weights need not add to one. Brownie will output results for each tree, a summary table at the end, and a weighted average for all program outputs (so that a tree with twice the weight of another has twice the effect on calculation of average AIC value, for example). It can also save results to a tab-delimited file, suitable for analyzing in programs such as Microsoft Excel, with one line for each tree-character combination.

Input data

Brownie can now handle both continuous and discrete (“standard”) data. In some cases. both data types need to be loaded. Most nexus-compliant programs cannot do this within one Characters block (with the notable exception of MrBayes, with its “mixed” data format option). Mesquite can load and save multiple Characters blocks. Brownie cannot load multiple Characters blocks, but it can it can load one Characters block and then a Characters2 block (starts with “begin Characters2;”). These two blocks can be in any order (Characters2 can be discrete or continuous, as can the other Characters block) – the program will deal with them properly (if you have just one block of one type, or two blocks of two types – it won’t work for two different discrete character blocks). Sorry for the slight kludge. To import a Mesequite file with two Characters blocks, just add a “2” at the begin Characters statement of one of the blocks (“begin Characters2”).

Warnings

There are a few things to be cautious about when using Brownie. First, it assumes that each group (assigned using taxsets) is a paraphyletic or monophyletic group, not a polyphyletic group. That is, every group on every tree should be a subtree that can be pruned out of the tree without removing any edges that are within any other group. This is always this case in the situation of two groups if at least one of them is a clade. Care should be taken when doing analyses across a large set of trees, some of which might violate this property. Brownie does not yet have a way to test this. An appropriate method might be to exclude those trees where at least one of the groups of interest is polyphyletic (perhaps by loading the set of bootstrap or Bayes trees into PAUP and using one ore more backbone constraint trees as a filter) and then load the remaining trees into Brownie to use for analysis. Make sure to note in your publication that this was done, and include the relative weight of the excluded trees (examples: “20 percent of the bootstrap trees had a polyphyletic Vertebrata; all rate estimates and tests were performed on the remaining trees”; “Sixty-seven trees, with a combined posterior probability of 0.11, were excluded from analysis in Brownie since they had polyphyletic Vertebrata, rendering a rate test between vertebrates and invertebrates invalid”). Even moderate support for polyphyly of a group should affect the strength of your conclusions: if 20% of the trees are thrown out because they have polyphyletic vertebrates, making sweeping statements about vertebrates having a faster rate of evolution for a particular character is probably unwarranted: in 20% of the trees, the group “vertebrates” shares some branches with some other group, in which case the rate estimates are not independent, even under a multiple rate model.

Brownie can deal with input trees with polytomies, but note that polytomies are treated as hard. Brownie also does not yet deal appropriately with missing data: it will treat all missing data as having state zero. To deal with characters with missing data, just set up the taxsets in the comparison to not include any taxa with missing data for that character (those taxa may still occur on the input trees). Brownie also assumes that Brownian motion (with one to several rate parameters) adequately fits the data within each subtree. Data may have to be transformed to meet this assumption (for example, by taking the log). One can test this assumption by testing each of the subtrees in a program like Continuous (Pagel, 1994, 1998, 1999) or OUCH (Butler & King, 2004) to see whether the Brownian model is rejected. Future versions of Brownie will have the ability to perform such checks, as well as incorporate measurement error.

Citing

You should cite the O’Meara et al. 2006. paper (below). The program now also highlights some of the papers you should read and cite when you quit, based on the analyses you have done. Note that this is a subset of the relevant papers. You should also check the Brownie website (http://www.brianomeara.info/brownie), where you can also join email lists and see a discussion forum.

Installing

Brownie is distributed as a standalone executable for Mac OS X and Windows (using Cygwin) and as source code for other systems. Like r8s, PAUP* for non-Macs, and MrBayes, Brownie is a command-line program. It currently requires a minimal but non-zero amount of Unix knowledge on non-Macs (none is required on a Mac), which should all be given below . Brownie has extensive help within the program. There’s also a user support forum at http://www.brianomeara.info/brownie.

Installation on a Mac (OS X):

The easiest thing to do is use the installer. After downloading the disk image (.dmg file), double click it to open, and double click the installer. By default, this will put Brownie in /usr/bin so that Brownie will be in the executable path.

Brownie is a command line application. If installed properly, it can be run in any Terminal window by typing brownie (case sensitive). You can also type brownie followed by your file name to automatically load the file. The Mac install also includes a Brownie application in the Brownie folder, along with example files and this manual. Double-clicking on the program icon will prompt the user to select the input file and automatically start up Brownie (the program is actually an Applescript that calls the command line brownie in Terminal ). The program does not need to be in the same directory as the file to run; all program outputs (the echo and log files) are stored in the same folder as the input file by default. Quitting Brownie leaves the Terminal window open, in case the user wants to manipulate the files using Unix (the user will already be in the appropriate directory). Otherwise, the user can just quit Terminal once quitting Brownie.

If Brownie does not start, check the Brownie web page for help (http://www.brianomeara.info/brownie) or send me an email.

Installation on Windows:

On Windows, Brownie runs under Cygwin (http://www.cygwin.com), which provides a Linux-like operating environment. First, install Cygwin using the setup.exe installer available on the above website. Using only the defaults should work nicely. Now download brownie.exe and put it in cygwin/usr/local/bin (i.e., on your hard drive is a folder called cygwin, inside that is a folder called usr, etc. It’s easiest if you put your data files in cygwin/home/UserName, but they can be anywhere – you just have to use the Unix command cd to navigate to them. Now, just open the Cygwin application. Type ls to make sure your data file is where you think it should be. If not, either navigate to it using cd or type pwd to find where you currently are and then put the data file in the same folder. Type brownie (case sensitive) to start Brownie.

Compiling from source:

Download the source code. Brownie is written in C++. Note that Brownie is released under the GNU Public License. It includes a makefile, so you can just compile it using “make”. However, you’ll need to first install the GNU Scientific Library (v. 1.11 or later). It may require some configuration -- there are actually several Makefiles included in the distribution.

Getting started

Now you have Brownie running on your computer. Files can be loaded either when starting the program:

systemprompt% brownie mydatafile.nex

or after the program has started by typing exe mydatafile.nex . Note that you can enclose the file name in single quotes if you want to include a path:

exe ‘/home/documents/project1/mydatafile.nex’

Files should be in the Nexus (Maddison et al, 1997) format. They should include one or more trees with branch lengths and a continuous dataset. Note that Brownie only wants one entry per taxon per character: it does not yet use some of the other information (number of samples, range, etc.) that is possible with Nexus-formatted continuous data (though not actually implemented in any programs, as far as I know). A sample file is included in the Brownie distribution. Note that Brownie can read commands in its own Brownie block, so you can create batch files.

While it is executing a file, Brownie should provide information about what it is loading and report any errors. The basic approach to use with Brownie is to assign taxa to groups (using one or more taxsets), choose the tree and character to test, and perform a rate test. See below for a description of all relevant commands. In most cases, commands that look like PAUP* or MrBayes commands work in the same way. Also, commands take abbreviations: rather than typing “help” for help, just typing “h” will suffice. The same is true for many options within commands.

Commands

General syntax is the command name first, followed by options. Commands can be given interactively on the command line or in a Brownie block in a Nexus file.

General commands:

  • help: shows this message
  • exe: executes nexus file
  • log: log output
  • echo: copies your commands into a batch file
  • blocks: reports on blocks currently stored
  • showtree: displays currently loaded tree(s)
  • choose: chooses tree or char for analysis
  • taxset: stores a taxset
  • citation: outputs list of relevant papers for your analyses
  • tipvalues: return list of tip values
  • quit: terminates application

Character evolution:

  • ratetest: does censored rate test (original Brownie function)
  • vcv: outputs a variance-covariance matrix
  • (discrete): implements discrete character models and reconstructions
  • [tipvariance]: allows program to deal with variance in taxon means
  • model: sets model of continuous character evolution (OU, BM, etc)
  • (continuous): gets score for chosen taxset for chosen model
  • [export]: exports a tree and data in deprecated Pagel format
  • (simulate): simulate discrete or continuous character matrices
  • loss: estimate rates of binary character loss on branches

Species delimitation and tree search:

  • (hs): perform a heuristic search
  • [jackknife]: perform a jackknife search
  • [exhaustive]: perform an exhaustive search
  • compare: compare triplet overlap for coalescent trees
  • assign: assign samples to species
  • (accuracy): compute accuracy of reconstruction

Numerical optimization settings:

  • set: sets options
  • numopt: sets parameters for numerical optimization functions

Miscellaneous:

  • orderbytree: reorders a datamatrix by order of taxa in a tree
  • printedgelength: prints branch lengths
  • (partitionededge)-> outputs all trees one NNI move away for NNIBS analysis

Type "commandname ?" [without the quotes]
for help on any command.

*** IMPORTANT ***
Commands in brackets ("[]") should not be used for published results yet
Commands in parentheses ("()") should be used after checking with Brian O'Meara (omeara.brian@gmail.com) -- they may reflect things in review which might change pending reviewers' comments, for example

Problems/suggestions

First, read the manual. Then see the user support forum at http://www.brianomeara.info/brownie. You may also email me at omeara.brian@gmail.com. If the example files work but not your data file, sending me the data file will be helpful in troubleshooting. I won't steal it (yes, software authors "stealing" data is a concern some users have expressed, though no Brownie users so far).

Acknowledgments

Brownie 2 uses the Nexus Class Library from Paul Lewis and TreeLib from Rod Page, as well as code from Ligia Matieu and John Burkardt. Many people helped with the ideas behind Brownie: see the O'Meara et al. 2006 paper for more details. Funding for its development have come from University of California, Davis Population Biology fellowship, NSF Graduate Research Fellowship and Dissertation Improvement Grant (DEB-0608487) awards to Brian O'Meara, and from the National Evolutionary Synthesis Center (NESCent) (NSF #EF-0423641).

Citations (NEEDS UPDATING)

Akaike, H. 1973. Information theory as an extension of the maximum likelihood principle. in B.N. Petrov and F. Csaki (eds.) Second International Symposium on Information Theory. Akademiai Kiado, Budapest.

Burnham, K.P. and D.R. Anderson. 2002. Model Selection and Multimodel Inference: A practical information theoretic approach, 2nd edition. Springer-Verlag, New York.

Butler, M.A. and A.A. King, 2004. Phylogenetic comparative analysis: a modeling approach for adaptive evolution. Am. Nat. 164: 683-695.

Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.572c. Distributed by the author. Department of Genetics, University of Washington, Seattle.

Maddison, D.R., D.L. Swofford, and W.P. Maddison. 1997. NEXUS: An extensible file format for systematic information. Syst. Biol. 46: 590-621.

O’Meara, B.C., C.M. Ané, M.J. Sanderson, P.C. Wainwright. 2006. Testing for different rates of evolution using likelihood. Evolution in press.

Pagel, M. 1994. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc. R. Soc., Ser. B: Biol. Sci. 255: 37-45.

——— 1998. Inferring evolutionary processes from phylogenies. Zool. Scr. 26:331-348.

——— 1999. Inferring the historical patterns of biological evolution. Nature 401: 877-884.

Sanderson, M.J. 1997. A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol. Biol. Evol. 14: 121-1231.

Sanderson, M.J. 2002. Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Mol. Biol. Evol. 19: 101-109.

Sugiura, N. 1978. Further analysis of the data by Akaike’s information criterion and the finite corrections. Communications in Statistics, Theory, and Methods. A7: 13-26.

Swofford, D.L. 2003. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4.0b10. Sinauer Associates, Sunderland, Massachusetts.

Thorne, J.L., H. Kishino, and I.S. Painter. 1998. Estimating the rate of evolutionn of the rate of evolution. Mol. Biol. Evol. 15: 1647-1657.