Statistical Analysis of Genome Scale Data
June 30 - July 13, 2023

Key Dates
Application DeadlineMarch 15, 2023
Arrival: June 29th by 6pm EST
Departure: July 13th around 12pm EST

CSHL courses are intensive, running all day and often including evenings and weekends; students are expected to attend all sessions and reside on campus for the duration of the course.

Instructors:
Vincent Carey, Harvard Medical School
Sean Davis, University of Colorado Anschutz School of Medicine

********

COVID-19: All participants planning to attend in-person will be required to attest to recent COVID vaccination (within one year of the course’s start date) with an FDA or WHO approved vaccine. Additional safety measures will be in line with current NY and Federal Guidelines applicable in Summer 2023.

********


See the roll of honor - who's taken the course in the past

Over the past decade, high-throughput assays have become pervasive in biological research due to both rapid technological advances and decreases in overall cost. To properly analyze the large data sets generated by such assays and thus make meaningful biological inferences, both experimental and computational biologists must understand the fundamental statistical principles underlying analysis methods. This course is designed to build competence in statistical methods for analyzing high-throughput data in genomics and molecular biology.

Topics Include:
  • The R environment for statistical computing and graphics
  • Introduction to Bioconductor
  • Review of basic statistical theory and hypothesis testing
  • Experimental design, quality control, and normalization
  • High-throughput sequencing technologies
  • Expression profiling using RNA-Seq and microarrays
  • In vivo protein binding using ChIP-Seq
  • High-resolution chromatin footprinting using DNase-Seq
  • DNA methylation profiling analysis
  • Integrative analysis of data from parallel assays
  • Representations of DNA binding specificity and motif discovery algorithms
  • Predictive modeling of gene regulatory networks using machine learning
  • Analysis of posttranscriptional regulation, RNA binding proteins, and microRNAs

Format: Detailed lectures and presentations by instructors and guest speakers will be combined with hands-on computer tutorials. The methods covered in the lectures will be applied to example high-throughput data sets.

2022 Speakers:
Martin Aryee, Dana Farber Cancer Institute
Elana Fertig, Johns Hopkins University
Ludwig Geistlinger, Harvard Medical School
Stephanie Hicks, Johns Hopkins University
Anshul Kundaje, Stanford University
Michael Love, UNC-Chapel Hill
Charlotte Soneson, Friedrich Miescher Institute for Biomedical Research

This course is supported with funds provided by the National Human Genome Research Institute of the National Institutes of Health

Support & Stipends:

On average, 50% of trainees receive financial support on a needs-basis.

Stipends are available to offset tuition costs as follows:

       

Please indicate your eligibility for funding in your stipend request submitted when you apply to the course. Stipend requests do not affect selection decisions made by the instructors. 

Cost (including board and lodging): $4,210

No fees are due until you have completed the full application process and are accepted into the course.

Before applying, ensure you have:
  1. Personal statement/essay;
  2. Letter(s) of recommendation;
  3. Curriculum vitae/resume (optional);
  4. Financial aid request (optional).
    More details.

If you are not ready to fully apply but wish to express interest in applying, receive a reminder two weeks prior to the deadline, and tell us about your financial aid requirements, click below: