In the course of doing scientific programming, we have learned certain things, often the hard way. May the tips here prevent that for you.
- Use version control: it's just much easier in the long run, especially if you are working with others. We use subversion, though if starting de novo, would probably use mercurial or git.
- Numerical issues such as the smallest magnitude double you can have in C++ or R are not arcana of programming you can ignore. In many phylogenetics problems, you will have very small number where this can matter, so know what to do with them and ideally how to avoid them.
- Non-invertible matrices are singularly annoying and can sometimes come up (i.e., with a variance-covariance matrix you are using to calculate likelihood under brownian motion). Pseudoinverse (pinv in some languages, Moore-Penrose Pseudoinverse in math-speak) is your friend.
- Be obsessive. If you are getting an error from a subroutine, or something works "most of the time", find out what the problem is. These things will reappear when you most need things to work. Worse, they can affect others who use your code.
- Reuse. Yes, you could write a new library to load phylogenetic trees and do linear algebra on matrices derived from them. Use an established library/package for this instead. They have already made most of the mistakes you are about to make, their code has been thoroughly tested by many other developers, and they are probably smarter than you are about many of the routines being implemented.
- Choose wisely. When you are choosing a language to use, unless you are an expert, look at the community around it. Are there lots of examples to learn from? People still actively developing in it? A way for technophobic users to make use of whatever you create? "apt-get" won't work for most.
- Comment. Will someone else be able to read your code? Will you be able to read your own code in a year?
- Open. You have benefited from others opening their code. Plus, science works based on learning from others' work. Hiding your source code is a bad idea.
- Functionality over prettiness. MrBayes is not a pretty piece of software. It doesn't have nice pull down menus. There is no dancing Clippy who will ask if you want him to diagnose convergence. But it has a thorough manual and built-in help. Biologists can use it to do work, and when it first came out, it was pretty unique in what it could do. It is better that programmers put effort into allowing things like morphological and amino acid characters than in things like making a pretty graphical interface -- more biology could be done. Note that there was enough help for regular people to use it -- they didn't have to parse the source code to learn how.