INTEGRATED
DATA ANALYSIS FOR HIGH THROUGHPUT BIOLOGY
June 13 - 26, 2007
Application
Deadline: March 15, 2007
Instructors:
Harmen Bussemaker, Columbia University
Vincent Carey, Harvard University
Partha Mitra, Cold Spring Harbor Laboratory
Mark Reimers, National Cancer Institute
High-throughput
biology, epitomized by the ubiquitous DNA microarray, is
rapidly generating enormous observation sets. Biologists
seeking to make sense of this growing body of data need
to have a firm grasp of statistical methodology. This course
is designed to build competence in quantitative methods
for the analysis of high-throughput molecular biology data,
from which meaningful inferences about biological processes
can be drawn.
-
Review of multivariate statistics
- R mini-tutorial
- Expression and other microarrays - experimental design,
scanning and image analysis, quality control, normalization
and probe-level analysis for spotted arrays or prefabricated
chips, exploratory analysis, tests of significance and multiple
testing, using R and Bioconductor
- Discrimination and classification of samples
- Identifying general regulation themes (e.g. Gene Ontology
categories) in gene lists by statistical means
- Promoter analysis in yeast using CHIP and expression data
- Identifying regulatory polymorphism using SNP and expression
data
- Characterizing the effect of DNA amplifications and deletions
on gene expression in cancer using CGH and expression data
on the same samples
Speakers
in last year's course included:
Keith
Baggerly, M.D. Anderson Cancer Centre
Vivian Cheung, University of Pennsylvania
Aedin Culhane, Dana Farber Cancer Institute/Harvard
Bruce Futcher, SUNY Stony Brook
Audrey Gasch, University of Wisconsin-Madison
Rafael Irizarry, Johns Hopkins Bloomberg School of Public
Health
Vishy Iyer, University of Texas at Austin
Ari Melnick, Albert Einstein College of Medicine
Stefano Monti, Whitehead Institute/MIT
Terry Speed, University of California, Berkeley
Richard Spielman, University of Pennsylvania
John Weinstein, National Cancer Institute
Richard Young, Whitehead Institute/MIT
Julia Zeitlinger, MIT
The
first week of the course will concentrate on analysis of
specific types of microarray data (expression, Affymetrix,
CGH, CHIP-chip, and SNP arrays), and proteomics. The second
week will explore biological problems involving the integration
of several types of high-throughput data. Data sets will
be drawn from yeast, human polymorphisms, and cancer biology.
Students
are expected to take some time before the course to become
familiar with the R statistical programming environment.
See:
http://www.r-project.org
http://www.biostat.harvard.edu/~carey/cdataSetup.html
This
course is supported with funds provided by the National
Cancer Institute