The DREAM Challenges

CONFERENCE

June 25, 2020

1. Introduction Pablo Meyer, IBM Research (5min)

2. Gene selection for optimal prediction of cellular position in tissues from single-cell transcriptomics data
Jovan Tanevski, Heidelberg University (10min)

Single-cell RNA-seq technologies are rapidly evolving but while very informative, in standard scRNAseq experiments the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to keep the localization of the cells have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To bridge the gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as gold standard genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize rare subpopulations of cells. Selection of predictor genes was essential for this task and such genes showed a relatively high expression entropy, high spatial clustering and the presence of prominent developmental genes such as gap and pair-rule genes and tissue defining markers. Application of the Top-10 methods to a zebrafish embryo dataset yielded similar results regarding the performance of the methods and the statistical properties of the selected genes. This proves that our predictions define general properties of the genes selected as a reference when reconstructing accurately the spatial position of cells in a tissue.

3. Community effort for the prediction of single cell signaling in breast cancer
Attila Gabor, Heidelberg University (15min)

The recent development of data acquisition techniques enables the investigation of cellular signalling at an unprecedented level of detail. Tognetti et al. (2020) report mass cytometry measurements, covering 36 markers in over 4000 conditions totalling 80 million single cells across 67 breast cancer cell lines. Based on this dataset, we organised the Single Cell Signaling in Breast Cancer DREAM challenge to assess the performance and limits of predictive models for single cell datasets. Through four increasingly challenging sub-challenges, the participants predicted the time course response of single cells to stimuli in the presence and absence of kinase inhibitors.
First, missing marker values were imputed. The solutions from 16 teams show a high correlation with the validation data and revealed difficulties in predicting cell lines that have rare signalling patterns. In the two subsequent sub-challenges, the participants predicted the time response of all markers in response to kinase inhibitors, first to inhibitors with known effects and then to an inhibitor which was absent from the training data. The four best teams were better than the sampling-based reference model and the top team achieved an accuracy similar to a biological replica. Although overall machine learning-based methods performed better, sampling methods were able to better capture the single cell heterogeneity.
In the last sub-challenge, the goal was to predict the population response to perturbations from unperturbed, basal data. Only 11 teams from the 25 participants achieved better performance than random, showcasing the difficulty of the task. The systematic combination of the predictions for each sub-challenge marginally improved the prediction accuracy but resulted in more robust predictions for new cell-lines. Overall, the challenge shows that despite the stochastic nature of the processes in single cells, the signaling events are tightly controlled and machine learning methods can accurately predict new experimental data.

4. Benchmarked machine learning approaches for cell lineage reconstructions
Irepan Salvador, UCL, & Wuming Gong, University of Minnesota (20 min)

The recent advent of new CRISPR-based molecular tools allows the reconstruction of cell lineages based on the phylogenetical analysis of DNA mutations induced by CRISPR during development and promises to solve the lineage of complex model organisms at single-cell resolution. The recent advent of new CRISPR-based molecular tools allows the reconstruction of cell lineages based on the phylogenetical analysis of DNA mutations induced by CRISPR during development and promises to solve the lineage of complex model organisms at single-cell resolution. To date, however, no lineage reconstruction algorithms have been rigorously examined for their performance/robustness across diverse molecular tools, datasets, and number of cells/size of lineage trees. It also remains unclear whether new Machine-Learning algorithms that go beyond the classical ones developed for reconstructing phylogenetic trees, could consistently reconstruct cell lineages to a high degree of accuracy. The first challenge consisted on the reconstruction of in vitro cell lineages of 30 trees with less than 100 cells. This recording system relies on the activity of viral integrases to perform irreversible edits on DNA recording arrays that can be read out using imagining. For details on the method see (Frieda et al 2016), the updated version of the MEMOIR (Memory by Engineered Mutagenesis with Optical In situ Readout) system enables 3 different states for each recording unit (Chow et al 2019).In the C.elegans challenge participants had to reconstruct an in silico generated tree of 1,000 cells from molecular data and using reconstructed lineages from surrogate trees of 100 cells. Finally for the mouse challenge participants will have to reconstruct an in silico generated tree of 10,000 cells from molecular data and reconstructed cell lineages of around 1000 cells in size. Through the usage for training purposes of experimental and in silico generated data sets, the goal of this challenge is to mobilize a larger community for evaluating new optimal tree-building methods.Multicellular organisms are composed of billions or trillions of different interconnected cells that derive from a single cell through repeated rounds of cell division. Knowing the cell lineage that produces a fully developed organism from a single cell provides the framework for understanding when, where and how cell fate decisions are made.

5. Discussion Pablo Meyer, IBM, and Julio Saez-Rodriguez, Heidelberg University (10min)