The Genome Access Course (TGAC)
Delphine Fagegaltier, New York Genome Center
Benjamin King, University of Maine
Emily Hodges, Vanderbilt University School of Medicine
Steven Munger, The Jackson Laboratory
The Genome Access Course (TGAC) is an intensive two-day introduction to bioinformatics. The course will begin at 8 a.m. on the first day (Tuesday November 17) and end at 5 p.m. on the second day (Wednesday November 18).
TGAC is broken into modules that are each designed to give a broad overview of a given topic, with ample time for examples chosen by the instructors. Each module features a brief lecture describing the theory, methods and tools followed by a set of worked examples that students complete. Students are encouraged to engage instructors during the course with specific tasks or problems that pertain to their own research.
The core of the course is the analysis of sequence information framed in the context of completed genome sequences. Featured resources and examples primarily come from mammalian species, but concepts can be applied to any species. The course also features methods to assist the analysis and prioritization of gene lists from large scale microarray gene expression and proteomics experiments. The topics covered in each two-day iteration of TGAC are taken from the following list.
Sequence, Gene, and Protein Resources
- NCBI sequence, gene, and protein resources
- Model organism databases: Mouse Genome Informatics, Rat Genome Database, ZFIN, FlyBase
- Protein sequence and domain resources: UniProt, PDB, InterPro
- Proteomics resources: IPI, ExPaSY, PRIDE
- microRNA resources: miRBase, microCosm Targets, TargetScan, PicTar
- Repositories of high-throughput sequence data
- Repositories of gene expression data: GEO, ArrayExpress
- Gene expression profiling resources
- Gene Ontology
- Genome sequencing and assembly
- Gene annotation
- Overview and comparison of major genome browsers: Ensembl, UCSC, NCBI
- Adding custom tracks
- Bulk genome retrieval tools: BioMart, UCSC Table Browser
de novo Analysis of Sequences
- Local, global, pairwise, and multiple sequence alignments
- BLAST and BLAT algorithms
- Scoring matrices: PAM, BLOSUM
- Iterative profile and pattern searches
- Multiple sequence alignment programs
- Visualizing & editing multiple alignments
- Types of sequence and structural variation
- SNP resources: dbSNP
- Structural variation resources: dbVar, DGVa, HGVbase
Comparative Genome Analysis and Functional Genomic Elements
- Finding putative regulatory elements by comparing genomes
- Ortholog and paralog resources
- Multicontigview in Ensembl
- Comparative tracks in the UCSC Genome Browser
- DCODE and ENCODE resources
Analysis of High-Throughput Sequence Data
- Common file formats: FASTQ, SAM, BAM
- Quality control and diagnostic analyses
- Mapping reads to a reference sequence
- Finding putative mutations and polymorphisms
- RNA-Seq data analysis
- ChIP-Seq data analysis
- de novo assembly
- Galaxy resources
Gene Set Enrichment and Pathway Analysis
- Prioritizing genes from microarray and proteomics experiments
- Gene set enrichment analysis tools: GSEA, DAVID
- Pathway resources: Reactome, HPRD NetPath, KEGG
- Protein interaction resources: MIPS, MINT, BIND, DIP
Each student will be provided with a laptop (if needed) and internet access for the duration of the course. You can also bring your own laptop to the course provided it meets the following requirements: 1) a standard browser (Chrome, Internet Explorer, Firefox, etc.) that is up-to-date with security patches and bug fixes, 2) wireless internet capacity, and 3) the ability to view and modify plain text files and spreadsheets (e.g., Microsoft Word and Excel). Both PCs and Macs are acceptable as long as they're updated with all security patches and bug fixes.
The Genome Access Course is open to all on a first-come, first-served registration system. It is most beneficial for bench scientists transitioning into projects that require intensive analysis or integration of large data sets. The course will introduce you to publicly available resources, and it will also help you develop a vocabulary that can be used to collaborate with computational scientists.
If you already have significant programming or data analysis experience, TGAC is not appropriate for you. For more detailed curriculum on methods used in computational biology, please see the Computational Genomics course. Students interested in the practical aspects of software development are encouraged to apply to the course on Programming for Biology. Students who would like in-depth training in the analysis of next-generation sequencing data (e.g., genome assembly and annotation, SNP calling, and the detection of structural variants) may be interested in the course on Advanced Sequencing Technologies. Finally, please see the course on Statistical Methods for Functional Genomics if you would like training in the statistical analysis of high-throughput genomics data.
The curriculum of The Genome Access Course has been developed in conjunction with staff at the Wellcome Trust Sanger Institute and the European Bioinformatics Institute (Hinxton, UK), who teach a parallel series of courses in the UK.
Major support is provided by the Helmsley Charitable Trust. Limited financial aid is available; please apply in writing to Maureen Morrow describing your need for financial support.
Academic Package: $805
Corporate Package: $1,420
All packages cover registration, class materials, food, coffee breaks, and a reception. Neither housing nor transportation to and from the New York Genome Center are included in the registration fees; participants are expected to make their own housing and transportation arrangements. Full payment is due three weeks prior to the course.
The November 2020 iteration of The Genome Access Course will be held at the New York Genome Center (NYGC, 101 Avenue of the Americas, NY NY 10013). The NYGC is located in the neighborhood of Soho in Lower Manhattan. It is easily accessible by public transportation or taxi cab from all three New York-area airports: John F. Kennedy International Airport, LaGuardia Airport, and Newark Liberty International Airport. The NYGC is on the 1, A, C, and E metro subway lines, all of which connect directly with regional train service into New York Penn Station. Parking is available in garages throughout Soho, including an Icon garage a few blocks from the NYGC.