REFS™ Technology

REFS™ (Reverse Engineering and Forward Simulation) is GNS Healthcare’s scalable, supercomputer-enabled framework for discovering new knowledge from real world data. REFS™ automates the extraction of causal network models directly from observational data and uses high-throughput simulations to generate new knowledge.

Watch a video showing how REFS™ automates knowledge discovery:

The mathematics behind causal network models are transparent and well documented1,2,3,4. What sets REFS™ apart are our highly optimized, proprietary machine-learning algorithms, run on massively parallel cloud-based supercomputers. This approach allows us to discover knowledge at scale that is relevant to today’s most pressing healthcare problems.

REFS™ extracts new knowledge in two steps: Reverse Engineering, followed by Forward Simulation.

Reverse Engineering

REFS™ uses machine learning to extract the underlying structure from data and encode it in the form of Causal Network Models. These networks represent causal relationships – not just correlations – in the data. Because the true structure underlying the data are uncertain, REFS™ uses a Markov Chain Monte Carlo algorithm to learn an ensemble of models. The REFS™ algorithm incorporates multiple, disparate data types and multi-layered data.

Forward Simulation

By capturing causal relationships, REFS™ models allow us to use simulations to ask questions about interventions. We can ask, for example, what is the effectiveness of a therapy in a patient with a particular genotype and medical history. Because we perform Monte Carlo simulations across an ensemble of models, our method gives both a prediction and a confidence interval that measures the uncertainty about that prediction.

1 Pearl, Judea. Causality: Models, Reasoning, and Inference. 2nd ed. New York: Cambridge University Press, 2009.
2 Using Bayesian networks to analyze expression data. Friedman N, Linial M, Nachman I, Pe’er D. J Comput Biol. 2000. 7:601-620.
3 Causal protein-signaling networks derived from multiparameter single-cell data. Sachs K, Perez O, Pe’er D, Lauffenburger DA, Nolan GP. 2005. Science 308:523-529.
4 Exploiting naturally occurring DNA variation and molecular profiling data to dissect disease and drug response traits. Schadt EE. Curr Opin Biotechnol. 2005. 16:647-654.