More often than not, today’s biologist is studying data that is too complex or numerous to be analyzed without a computer and only boilerplate analysis can be performed with existing tools. Questions specific to the data set require novel analysis pipelines to be designed and written in computer code. Designed for lab biologists with little or no programming experience, this course will give students the bioinformatics and scripting skills necessary to exploit this abundance of biological data. The only prerequisite for the course is a strong commitment to learning basic UNIX and a scripting language. Lectures and problem sets from previous years are available online, and students are welcome to study this background material before starting the course.
This year, we are offering the course in Python, an easy-to-learn scripting language with a growing code base and community of users. The course begins with one week of introductory coding, continues with practical topics in bioinformatics, with plenty of coding examples, and ends with a group coding project. Formal instruction is provided on every topic by the instructors, teaching assistants, and invited experts. Students will solve problem sets covering common scenarios in the acquisition, validation, analysis, and visualization of biological data. They will learn how to design, construct, and run powerful and extensible analysis pipelines in a straightforward manner. Final group projects will be chosen from ideas proposed by students and will be guided by faculty. Students will be provided with a library of Python reference print and e-books that they can bring home with them.
Note that the primary focus of this course is to provide students with practical programming experience, rather than to present a detailed description of the algorithms used in computational biology. For the latter, we recommend the Foundation of Computational Genomics course.