R is a useful environment/language for statistical work. Here is a basic tutorial on using it for phylogenetics. You may also look at the basic info at my general R tutorial page; also see sections 1.5, 1.7, 1.10, 2.1, 2.3, 2.8, 3.2, 12.1, 12.1.4, and 13 of the R manual. R commands will be written this way. It is often best to store them in a text file, so if you want to repeat the analysis, you can just rerun the file. Anything preceded by a # in R code is ignored, so you can use this to store comments to describe what you're doing [for those in my class, send me your script, and include your name as a comment].
install.packages("ctv"). It may ask you to choose a repository -- I generally use Berkeley's, as it's updated frequently.library("ctv")install.views("Phylogenetics")library("ape")?ape. For more information, type library(help = ape)?rcoal to learn the syntax for creating a tree with 10 taxa. Then create one and store it in an object named "phy".phy<-rcoal(n=10). Remember that in R, <- is an assignment operator: foo<-5 means foo will equal 5.print(phy) to get basic info about phy (this is general for any R object). summary(phy) may give us a bit more info. plot(phy) will draw the tree.class(phy) is "phylo" meaning phy is a kind of "phylo" object. Such objects have their own plotting function, plot.phylo(). So R knows to use this function for plotting. Get help on this function and then plot a tree in the upwards orientation, without taxon labels, and with thick branches. [If you're doing this for my class, this plot is part of your assignment: use the pulldown Export menu on the plot panel to save a copy of this plot to email to me].AllTrees() and length().
library(phangorn) #to load the phangorn package
x<-c() #creates an empty x vector
y<-c() #creates an empty y vector
for (number_of_taxa in 3:8) { #this starts a loop: the first time through, variable number_of_taxa=3, the next time, it equals 4, then 5, etc. up to and including 8.
print("now doing this number of taxa:")
print(number_of_taxa)
x[number_of_taxa-2]<-number_of_taxa #here we are sticking the number of taxa into vector x. x[1] is the first slot in that vector, x[2] is the second slot in that vector… (it expands if it is not that size already -- not all languages do that). Think about why we do number_of_taxa - 2 rather than number_of_taxa.
#WAY TO STORE THE LENGTH OF THE OUTPUT FROM allTrees, WHICH IS THE NUMBER OF POSSIBLE TREES, FOR PLOTTING ON THE Y-AXIS
}
#PLOT THE NUMBER OF POSSIBLE TREES VERSUS THE NUMBER OF TAXA (DO A LOG SCALE FOR NUMBER OF TREES)
#WAY TO STORE.... and #PLOT THE... with ways to do this. [For those in my class, send me your script and final plot]for() loops are easy to understand, they can often be much slower in R than using commands in the lapply() family. They are trickier to learn, but not that difficult, and can offer orders of magnitude speed up. Finally, you are right now probably only using one core of your multicore machine (if it's a laptop or desktop from the past several years). Look at the foreach() function in the documentation for the doMC package for ways to easily turn a for() loop [bad] to a job that automatically executes in parallel [awesome].