Statistical Analysis of Genome Scale Data
July 1 - July 14, 2022
Application & Materials Deadline: March 15, 2022

Course Instructors:

Vincent Carey, Massachusetts General Hospital and Channing Lab, Harvard
Sean Davis, University of Colorado Anschutz School of Medicine

Course Organizers:

Martin Aryee, Massachusetts General Hospital and Dana Farber Cancer Institute
Elana Fertig, Johns Hopkins Bloomberg School of Public Health
Stephanie Hicks, Johns Hopkins Bloomberg School of Public Health
Anshul Kundaje, Stanford University
Michael Love, University of North Carolina*
Charlotte Soneson, Friedrich Miescher Institute for Biomedical Research*
*Virtual participation


COVID-19: All participants planning to attend in-person will be required to provide documentary proof of full vaccination AND first booster (when eligible) with an FDA or EMA approved vaccine. Additional safety measures will be in line with current NY and federal guidelines applicable in summer 2022.


See the roll of honor - who's taken the course in the past

Over the past decade, high-throughput assays have become pervasive in biological research due to both rapid technological advances and decreases in overall cost. To properly analyze the large data sets generated by such assays and thus make meaningful biological inferences, both experimental and computational biologists must understand the fundamental statistical principles underlying analysis methods. This course is designed to build competence in statistical methods for analyzing high-throughput data in genomics and molecular biology.

Topics Include:
  • The R environment for statistical computing and graphics
  • Introduction to Bioconductor
  • Review of basic statistical theory and hypothesis testing
  • Experimental design, quality control, and normalization
  • High-throughput sequencing technologies
  • Expression profiling using RNA-Seq and microarrays
  • In vivo protein binding using ChIP-Seq
  • High-resolution chromatin footprinting using DNase-Seq
  • DNA methylation profiling analysis
  • Integrative analysis of data from parallel assays
  • Representations of DNA binding specificity and motif discovery algorithms
  • Predictive modeling of gene regulatory networks using machine learning
  • Analysis of posttranscriptional regulation, RNA binding proteins, and microRNAs

Format: Detailed lectures and presentations by instructors and guest speakers will be combined with hands-on computer tutorials. The methods covered in the lectures will be applied to example high-throughput data sets.

2019 Speakers:

Brittany Adamson, Princeton University, Princeton, NJ
Elana Fertig, Johns Hopkins University, Baltimore, MD
Tuuli Lappalainen, New York Genome Center & Columbia University, New York, NY
Karen Mohlke,
University of North Carolina, Chapel Hill, NC
Robert Patro, Stony Brook University, Stony Brook, NY

This course is supported with funds provided by the National Human Genome Research Institute of the National Institutes of Health

Support & Stipends:

On average, 50% of trainees receive financial support on a needs-basis.

Stipends are available to offset tuition costs as follows:


Please indicate your eligibility for funding in your stipend request submitted when you apply to the course. Stipend requests do not affect selection decisions made by the instructors. 

Cost (including board and lodging): $4,090

No fees are due until you have completed the full application process and are accepted into the course. Students accepted into the course should plan to arrive by early evening on June 30 and plan to depart after lunch on July 14.

Before applying, ensure you have:
  1. Personal statement/essay;
  2. Letter(s) of recommendation;
  3. Curriculum vitae/resume (optional);
  4. Financial aid request (optional).
    More details.

If you are not ready to fully apply but wish to express interest in applying, receive a reminder two weeks prior to the deadline, and tell us about your financial aid requirements, click below: