Machine Learning Scientist I (Neuroscience)

Broad Institute - Cambridge, MA (30+ days ago)4.1

The Broad Institute's Klarman Cell Observatory (KCO), directed by Aviv Regev, is an effort to systematically define cellular circuits within and between cells in tissues to better understand health and disease. The KCO builds on breakthrough technologies such as single cell genomics (e.g. single-cell RNA-Seq) and on interdisciplinary collaborations across the Broad and beyond.

The extent of cell-type complexity in the mammalian brain remains mostly unknown, with important but rare cell populations likely undiscovered. Recently, numerous methods for single cell genomics have been developed. For example, single-cell RNA-Seq now enables transcriptional analysis of tens of thousands of individual cells. Spatial methods for characterization of cells in tissues have also been developed.

This position involves computational analysis for a large-scale and impactful project, which aims to generate a comprehensive and integrated atlas of the mouse brain. This project is part of a larger initiative, which aims to perform a census of the brain. This person will help develop new computational methods, apply existing computational methods, interpret results within a biological context, and integrate analysis practices from groups at Broad and around the world. The ideal candidate has both a theoretical and practical understanding of machine learning techniques and has a proven track-record in areas such as computational biology, probability, statistics, complex networks data analysis, statistical physics, or high-performance computing.

This position is suited to a person who is excited by the prospect of learning, adapting and applying modern machine learning techniques to solve the key challenges for emerging biological data modalities, with revolutionary implications in advancing the state-of-the-art clinical practice. The position entails close collaborations with multiple Broad groups as well as groups from Harvard and Cold Spring Harbor. In particular with Professor Aviv Regev’s group at Broad and Evan Macosko’s group at Harvard/Broad/Stanley Center.


  • Analyze large single cell genomics datasets (single-cell RNA-Seq and spatial methods) from brain, and provide analyses in formats accessible to the research community
  • Develop, apply, document, and maintain computational tools, both for own use and use by the broader community as well as to support analysis by biologist colleagues without formal computational training. Critically evaluate computational solutions
  • Follow relevant scientific literature to ensure use of optimal methods and understand emerging practices across the field
  • Contribute to reports and papers for presentation and publication and present at scientific conferences, as appropriate
  • Regularly attend and present results at related team meetings to share results, plan projects and experiments and to ensure continuous communication around methods and tools developed
  • Work with other Broad computational biologists experienced with RNA-Seq, and spatial analysis, to learn, discuss, and integrate the most appropriate solution for an experiment or project

Ph.D. and 0-2 years experience in computer science, engineering, physics, mathematics, statistics, biology, or related quantitative fields

Demonstrated experience designing computational methods and tools, including prior experience with algorithms relevant to computational biology. Skill and experience with statistical analysis is strongly preferred

Familiarity with next-generation sequence data analysis tools

Must have demonstrated proficiency with several of the following technologies: Perl, Python, Matlab, or R

Strong bash/shell scripting and proficiency with UNIX operating systems

Familiarity with one object-oriented programming language (e.g. Java, C++, Go)

Experience with and solid understanding of statistical analysis is required

Working knowledge of existing machine learning and probabilistic programming infrastructures (e.g. Theano, TensorFlow, PyTorch)

Fluency with version control, including distributed version control and Git in particular

Basic understanding of molecular biology and next generation sequencing is highly preferred

Familiarity with next-generation sequence data analysis tools, ideally for Illumina

Familiarity with a range of sequence alignment tools, ideally those used for RNA-Seq (TopHat, STAR, RSEM, etc.), is a plus

Ideally will have demonstrated experience designing computational methods and tools, including prior experience with algorithms relevant to single-cell RNA-Seq and/or spatial analysis

Ability to work independently while making necessary connections with experts in various computational analysis groups

Self-starter, highly motivated, highly collaborative and works well with others.

Excellent communication, organization, and time management skills

EOE / Minorities / Females / Protected Veterans / Disabilities