### Technical Review

In silico Modeling

# IN SILICO MODELING

In silico modeling – the computational modeling of biochemical, metabolic, pharmacologic or physiologic processes – is a logical extension of in vitro experimentation [In silico Modelling of Physiologic Systems (Dec 2011)]. A natural result of the explosive increase in computing power available to research scientists at continually decreasing cost, in silico modeling combines the advantages of in vivo and in vitro experimentation, not subject to ethical considerations or the lack of control associated with many in vivo experiments (e.g. human or animal experimentation). In silico models also allow researchers to include a virtually unlimited array of parameters, potentially rendering the results more applicable to the organism as a whole.

Examples of recent work in the in silico domain include In vivo and In silico Dynamics of the Development of Metabolic Syndrome (Jun 2018) [code]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

The University of Connecticut’s “Virtual Cell,” Compartmental and Spatial Rule-Based Modeling with Virtual Cell (Oct 2017) [project pages here and herecodecommunitymedia]) provides a comprehensive platform for modeling and simulation of cell biology from biological pathways down to cell biophysics. VCell supports biochemical network and rule-based modeling and electrophysiology in compartmental modeling and within cellular geometry.

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

One might inquiry whether knowledge graphs, which naturally embed biochemical pathways and networks, are amenable to silico perturbation. The Cellular Potts Model (CPM ) is a computational biological model of the collective behavior of cellular structures. CPM allows modeling of many phenomena such as cell migration, clustering, and growth taking adhesive forces – taking environment sensing as well as volume and surface area constraints into account. In silico Modeling for Tumor Growth Visualization (Aug 2016) implemented a crude graphical model via a CPM, that can be visualized in Cytoscape via cpm-cytoscape  [project]. Those authors also described their work in Machine Learning for In Silico Modeling of Tumor Growth.

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image sourceECM: extracellular matrix.  Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

In the synthetic biology domain, Out-of-Equilibrium Microcompartments for the Bottom-Up Integration of Metabolic Functions (Jun 2018) [media] recently described the analysis of self-sustained metabolic pathways in a microfluidic platform, water-in-oil droplet microcompartments. The authors developed an assay based on nicotinamide adenine dinucleotide (NADH) fluorescence to quantify the metabolic state of the microcompartments. The minimal metabolism was constructed from a reaction converting glucose-6-phosphate into 6-phosphogluconolactone. The reaction was catalysed by glucose-6-phosphate dehydrogenase, an enzyme involved in the pentose phosphate pathway. A key feature integrated the ability to function under conditions where the reaction was sustained independently of the cofactor stoichiometry. The full conversion of the metabolic substrate required the regeneration of the cofactor NAD+: a regeneration module made of inverted membrane vesicles extracted from E. coli.

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

The metabolic pathways exemplified in that microfluidic approach (above) are easily modeled in a knowledge graph, suggesting the possibility of the in silico modeling of pathway reactions in parallel with “wet lab,” systems biology approaches. Other relatively easily-constructed models (e.g.: bioengineered microbes and viruses on microbiological media; transgenic rodents; etc.) could likewise be modeled in a combined systems biology, in silico modeling approach. In this regard, clustered regularly interspaced short palindromic repeats/CRISPR associated protein 9 (CRISPR/Cas9) gene editing technology is especially relevant; for example, note Section 3 in CRISPR/Cas9-Based Genome Editing for Disease Modeling and Therapy: Challenges and Opportunities for Nonviral Delivery (Jun 2017), and The Present and Future of Genome Editing in Cancer Research (Jul 2017).  CRISPR/Cas9 for Cancer Research and Therapy (Apr 2018) provides an excellent review and perspective on this technology.

Reinforcement learning (RL) can be applied in optimizing chemical/biochemical reactions. Optimizing Chemical Reactions with Deep Reinforcement Learning (Dec 2017) [code] showed that their RL model outperformed a state of the art algorithm, and generalized to dissimilar underlying mechanisms. Combined with LSTM to model the policy function, the RL agent optimized the chemical reaction with the Markov decision process (MDP) characterized by $\small \{S, A, P, R\}$, where $\small S$ was the set of experimental conditions (like temperature, pH, etc), $\small A$ was the set all possible actions that can change the experimental conditions, $\small P$ was the transition probability from current experiment condition to the next condition, and $\small R$ was the reward which is a function of the state.

• Their Deep Reaction Optimizer model iteratively recorded the results of a chemical reaction and chose new experimental conditions to improve the reaction outcome, outperforming a state of the art black box optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, they introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on “regret” from 0.062 to 0.039 compared with a deterministic policy (they used “regret” to evaluate the performance of the model ). For the optimization of real-world reactions, the authors carried out four experiments in microdroplets ( in their paper: synthesis of ribose phosphate, etc.) and recorded the production yield. Combining the efficient exploration policy with accelerated microdroplet reactions, their Deep Reaction Optimizer not only served as an efficient and effective reaction optimizer (optimal reaction conditions were determined in 30 min for the four reactions considered), it also provided a better understanding of the mechanism of chemical reactions than that obtained using traditional approaches.

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

In a revolutionary computational/in silico approach, Inferring Regulatory Networks from Experimental Morphological Phenotypes: A Computational Method Reverse-Engineers Planarian Regeneration (Jun 2015) [media: Planarian Regeneration Model Discovered by Artificial Intelligence] applied machine learning to uncover pathways associated with tissue regeneration. Planarian regeneration had been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model had been found by scientists that explained more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. Those authors presented a method that inferred the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks – recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. They demonstrated their approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature. By analyzing all the datasets together, their system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method provided an automated, highly generalizable framework for identifying the underlying control mechanisms responsible for the dynamic regulation of growth and form.

• “An artificial intelligence system has for the first time reverse-engineered the regeneration mechanism of planaria - the small worms whose extraordinary power to regrow body parts has made them a research model in human regenerative medicine. The discovery by Tufts University biologists presents the first model of regeneration discovered by a non-human intelligence and the first comprehensive model of planarian regeneration, which had eluded human scientists for over 100 years. …”

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

• See also subsequent similar work (different authors), Cell Type Atlas and Lineage Tree of a Whole Complex Animal by Single-Cell Transcriptomics (May 2018), which mapped the transcriptome for essentially all cell types a planarian (flatworm): dozens of cell types including stem cells, progenitors, and terminally differentiated cells. They then applied a new computational algorithm, partition-based graph abstraction (PAGA), which could predict a lineage tree for the whole animal in an unbiased way. Notably, their approach was applicable to other model and non-model organisms, assuming that their differentiation processes are sampled with sufficient time resolution.

[Image source. Click image to open in new window.]

• In turn, that data was used as an example of an approach that determined the intrinsic dimensionality of data. Although large-scale datasets are frequently high-dimensional, their data frequently possess structures that significantly decrease their intrinsic dimensionality (ID) due to the presence of clusters, points being located close to low-dimensional varieties or fine-grained lumping. Estimating the Effective Dimension of Large Biological Datasets Using Fisher Separability Analysis (Jan 2019) [code] tested a dimensionality estimator that was based on analysing the separability properties of data points, on several benchmarks and real biological datasets. They showed that the introduced measure of ID had performance competitive with state of the art measures, being efficient across a wide range of dimensions and performing better in the case of noisy samples. Moreover, it allowed estimating the intrinsic dimension in situations where the intrinsic manifold assumption was not valid. [Note their Fig. 5.]

[Image source. Click image to open in new window.]

The examples above hint at the potential advances afforded by machine learning in the biochemical/medical domain, which also include the prediction of biomolecular secondary structure [e.g. rawMSA: proper Deep Learning makes protein sequence profiles and feature extraction obsolete (Aug 2018)], biodesign (e.g of new anticancer drugs) and inverse molecular design (very well-reviewed in the July 2018 Science paper Inverse Molecular Design using Machine Learning: Generative Models for Matter Engineering), and numerous other applications. These approaches offer excellent opportunities for collaboration in the advancement of our understanding of metabolism, cellular signaling, regulatory mechanisms, and disease (including cancer).