Species delimitation course U. Gothenburg, Dec. 2010

This contains information for a species delimitation course offered at U. of Gothenburg, Sweden, in Dec. 2010. I am presenting on the methods contained in Brownie.

Local time Gothenburg, Sweden:

I will be delivering the lecture (PDF, Apple Keynote) and running the exercise using EVO. Students may log in to ask questions, show problems they are having, etc. I have also set up a mibbit irc channel (below) [now closed] for discussions during the lecture or exercise (advantage over EVO is that it doesn't require registration, but it also means I can't see your screen).


Exercise:

Objectives:

  1. Learn how to get help in Brownie.
  2. Learn how to run Brownie for species delimitation using defaults.
  3. Learn how to interpret and present results.
  4. Learn how to change settings and their effect.
  5. Learn about NEXUS file structure.
  6. Learn how to make batch files in NEXUS.
  7. Experiment

What to do:

Note that commands and program output are in fixed-width font.
Let me know if you have questions, fall behind, etc. THIS IS NORMAL. Remember, this course is for YOU -- if something is not going well, it is our duty to fix it.

  1. Log in to the cluster
  2. cp -r /usr/local/files/browniefiles ~ in order to copy (cp) the folder (-r to say to copy a folder, /usr/local/files/browniefiles for the actual folder) to our home directory (~). This way, everyone is not trying to save things to the same folder. Note that on an American keyboard, ~ is to the left of the "1" button, but I don't know about other keyboard layouts.
  3. Start brownie: /usr/local/bin/brownie.sh (there will be a warning based on lack of an input argument: this is fine, it is due to the particular shell script we're using)
  4. Brownie is being actively developed. You can see the version number (V2.1.3) as well as what the revision number this is in Brownie's source code repository (r302). The copy we are using was compiled on Nov. 12. Note that like much open source software, Brownie uses existing libraries, such as the GNU Scientific Library and code from Rod Page and Paul Lewis. It's also open: you can look at the raw source code, modify it, and make changes to it. The changes can be added to Brownie itself or you can put the code into a new program. Much new scientific software, including some of the others in this course, are written in this way.
  5. What to do next? When in doubt, get help. Type help, then return.
  6. Brownie has commands for lots of different functions. Which ones will we want to use?
  7. Ready? We'll want to use log, echo, hs, and quit. Log stores the output Brownie returns to the screen. Echo stores the input you give Brownie: this can be handy if you're playing around with an analysis and want to remember what you did. Hs does an heuristic search for the best delimited species tree (I played with the possibility of doing an exhaustive search but didn't implement it -- something you could add to the code (hope you have a fast computer to run it on, though). Quit leaves the program. Brownie reports citations for approaches it thinks you have used during an analysis (but these might not be exhaustive).
  8. Let's find out how to use log. Most Brownie commands give help if you type the command name followed by a space and a question mark. Type log ?.
  9. Start logging to a file called log1.txt
  10. Now get help with hs. How to do this?
  11. Okay, let's quit brownie
  12. Now, navigate to your directory (folder) that has the brownie input files. cd ("Choose Directory") is how to navigate to a folder. In our case, cd ~/browniefiles
  13. Make sure you're in the right directory: type ls to see the files here. You should see test1.nex. That's the file we're going to run.
  14. We're going to start brownie with an input file. As we learned yesterday, jobs are being run on a cluster -- a network of many computers. Type cat /usr/local/bin/brownie.sh to see the file used to start the brownie job. You can see it's very simple, just calling another script. cat /usr/local/sge/browniei.sh will show you this other script: also pretty simple, just with some options for dealing with the batching software. On your own computer, if brownie were installed correctly, just brownie filename.nex would work fine.
  15. Let's go! brownie.sh test1.nex
  16. That was underwhelming. All it did was tell us we have 10 trees. What should we do?
  17. Yes, let's start a log file. log file=test1.log. One thing to note is that we could have named it anything we wanted: test2.pdf, structurama.input, etc. It's still going to be a text file.
  18. Yes, let's start an echo file, too. echo file=test1.echo
  19. Make sure we know which options to use: hs ?
  20. Make it so: hs
  21. What is going on?
  22. Once it is done running, quit the program
  23. Where is the output? ls
  24. Normally, it would be here, but it is actually one level up: cd ..
  25. Let's look at the echo file first: less test1.echo
  26. q to quit less. Now less test1.log
  27. The tree returned is the species tree. You can open it in something like figtree
  28. Leave this file and look at other files in the directory. What is besttrees.tre.mesquite.nex? It's a file that, if you opened it in Mesquite, you could look at all the gene trees within the species tree. Rather than have you try to do scp and get Mesquite running, I will demonstrate it now. See this image for where to show constrained trees and here for output.
  29. Okay, let's look at other options. A key parameter is the weight between the structure cost and the GTP cost. Set this to a high value to put more emphasis on structure (will this tend to split or lump taxa?). Put this low to turn Brownie into a program for doing tree inference under GTP. Play with this.
  30. So far, we've just done the nonparametric method. What if we wanted to do the parametric method? This is based on ideas from Knowles and Carstens, and just tries to find the assignment of samples to species and the species tree with the best AIC score (where the likelihood is based on the probability of the gene trees given the species tree). Calculating the likelihood is possible (STEM does this, for one -- you'll be learning about this later this week). Brownie can use one of two other programs for this: COAL (Degnan and Salter, 2005) or ms (Hudson 2002). COAL works as you'd expect: give it a species tree and a gene tree, it returns a likelihood. When I was developing Brownie, there was one particular quirk of COAL (a very rare issue, but one sometimes you'd hit by looking over many tree combinations) that made it somewhat problematic in this use, so I opted to use ms as the primary way of calculating gene tree likelihoods. The way this was done was fairly kludgy: the probability of a gene tree is the expected amount of times you'd see it under a given species tree, so Brownie just uses ms to simulate a lot of gene trees under a potential species tree to estimate their probability in this way, rather than calculating the likelihood explicitly using equations from Degnan and Salter (2005). Later this week, you'll be hearing from people with many more elegant approaches to this, but you can try Brownie's approach now. Let's do another heuristic search, but with ms=yes, a different file name for the best trees, and nreps dropped down to 1 for speed. hs ? for help, remember.
  31. Now let's look at the input files for brownie. Make a copy of the test1.nex file (cp test1.nex test1modified.nex) and open it in your new favorite editor, nano (nano test1modified.nex).
  32. Walk through NEXUS file structure
  33. Let's add a Brownie block...
  34. begin brownie;
    log file=testbatch.log;
    hs;
    quit;
    end;

  35. Run the modified file
  36. Now play with some of the other files in ~/browniefiles and some of the other settings in Brownie. If you're logged into EVO, you can share your screen so I can look over your shoulder to see what you are doing.
  37. Questions?