Statistical Analysis of Genome Scale Data
June 7 - 21, 2024

Key Dates
Application DeadlineMarch 1, 2024
Arrival: June 7th by 6pm EST
Departure: June 21st around 12pm EST

CSHL courses are intensive, running all day and often including evenings and weekends; students are expected to attend all sessions and reside on campus for the duration of the course.

Harmen Bussemaker, Columbia University
Sean Davis, University of Colorado Anschutz School of Medicine
Hans Tomas Rube, University of California Merced
Min Zhang, University of California, Irvine

See the roll of honor - who's taken the course in the past

Over the past decade, high-throughput assays have become pervasive in biological research due to both rapid technological advances and decreases in overall cost. To properly analyze the large data sets generated by such assays and thus make meaningful biological inferences, both experimental and computational biologists must understand the fundamental statistical principles underlying analysis methods. This course is designed to build competence in statistical methods for analyzing high-throughput data in genomics and molecular biology.

Topics Include:
  • The R environment for statistical computing and graphics
  • Introduction to Bioconductor
  • Review of basic statistical theory and hypothesis testing
  • Experimental design, quality control, and normalization
  • High-throughput sequencing technologies
  • Expression profiling using RNA-Seq and microarrays
  • In vivo protein binding using ChIP-Seq
  • High-resolution chromatin footprinting using DNase-Seq
  • DNA methylation profiling analysis
  • Integrative analysis of data from parallel assays
  • Representations of DNA binding specificity and motif discovery algorithms
  • Predictive modeling of gene regulatory networks using machine learning
  • Analysis of posttranscriptional regulation, RNA binding proteins, and microRNAs

Format: Detailed lectures and presentations by instructors and guest speakers will be combined with hands-on computer tutorials. The methods covered in the lectures will be applied to example high-throughput data sets.

2023 Speakers:
Harmen Bussemaker, Columbia University
Leonardo Collado Torres, Lieber Institute for Brain Development
Ludwig Geistlinger, Harvard Medical School
Hans Rube, University of California
Min Zhang, University of California, Irvine

This course is supported with funds provided by the National Human Genome Research Institute of the National Institutes of Health

Support & Stipends:

On average, 50% of trainees receive financial support on a needs-basis.

Stipends are available to offset tuition costs as follows:


Please indicate your eligibility for funding in your stipend request submitted when you apply to the course. Stipend requests do not affect selection decisions made by the instructors. 

Cost (including board and lodging): $4,385 USD

No fees are due until you have completed the full application process and are accepted into the course.

Before applying, ensure you have:
  1. Personal statement/essay;
  2. Letter(s) of recommendation;
  3. Curriculum vitae/resume (optional);
  4. Financial aid request (optional).
    More details.

If you are not ready to fully apply but wish to express interest in applying, receive a reminder two weeks prior to the deadline, and tell us about your financial aid requirements, click below: