### Technical Review

In silico Modeling

# IN SILICO MODELING

## In silico Modeling

In silico modeling – the computational modeling of biochemical, metabolic, pharmacologic or physiologic processes – is a logical extension of in vitro experimentation [In silico Modelling of Physiologic Systems (Dec 2011)]. A natural result of the explosive increase in computing power available to research scientists at continually decreasing cost, in silico modeling combines the advantages of in vivo and in vitro experimentation, not subject to ethical considerations or the lack of control associated with many in vivo experiments (e.g. human or animal experimentation). In silico models also allow researchers to include a virtually unlimited array of parameters, potentially rendering the results more applicable to the organism as a whole.

Examples of recent work in the in silico domain include In vivo and In silico Dynamics of the Development of Metabolic Syndrome (Jun 2018) [code]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

The University of Connecticut’s “Virtual Cell,” Compartmental and Spatial Rule-Based Modeling with Virtual Cell (Oct 2017) [project pages here and herecodecommunitymedia]) provides a comprehensive platform for modeling and simulation of cell biology from biological pathways down to cell biophysics. VCell supports biochemical network and rule-based modeling and electrophysiology in compartmental modeling and within cellular geometry.

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

One might inquiry whether knowledge graphs, which naturally embed biochemical pathways and networks, are amenable to silico perturbation. The Cellular Potts Model (CPM ) is a computational biological model of the collective behavior of cellular structures. CPM allows modeling of many phenomena such as cell migration, clustering, and growth taking adhesive forces – taking environment sensing as well as volume and surface area constraints into account. In silico Modeling for Tumor Growth Visualization (Aug 2016) implemented a crude graphical model via a CPM, that can be visualized in Cytoscape via cpm-cytoscape  [project]. Those authors also described their work in Machine Learning for In Silico Modeling of Tumor Growth.

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image sourceECM: extracellular matrix.  Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

In the synthetic biology domain, Out-of-Equilibrium Microcompartments for the Bottom-Up Integration of Metabolic Functions (Jun 2018) [media] recently described the analysis of self-sustained metabolic pathways in a microfluidic platform, water-in-oil droplet microcompartments. The authors developed an assay based on nicotinamide adenine dinucleotide (NADH) fluorescence to quantify the metabolic state of the microcompartments. The minimal metabolism was constructed from a reaction converting glucose-6-phosphate into 6-phosphogluconolactone. The reaction was catalysed by glucose-6-phosphate dehydrogenase, an enzyme involved in the pentose phosphate pathway. A key feature integrated the ability to function under conditions where the reaction was sustained independently of the cofactor stoichiometry. The full conversion of the metabolic substrate required the regeneration of the cofactor NAD+: a regeneration module made of inverted membrane vesicles extracted from E. coli.

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

The metabolic pathways exemplified in that microfluidic approach (above) are easily modeled in a knowledge graph, suggesting the possibility of the in silico modeling of pathway reactions in parallel with “wet lab,” systems biology approaches. Other relatively easily-constructed models (e.g.: bioengineered microbes and viruses on microbiological media; transgenic rodents; etc.) could likewise be modeled in a combined systems biology, in silico modeling approach. In this regard, clustered regularly interspaced short palindromic repeats/CRISPR associated protein 9 (CRISPR/Cas9) gene editing technology is especially relevant; for example, note Section 3 in CRISPR/Cas9-Based Genome Editing for Disease Modeling and Therapy: Challenges and Opportunities for Nonviral Delivery (Jun 2017), and The Present and Future of Genome Editing in Cancer Research (Jul 2017).  CRISPR/Cas9 for Cancer Research and Therapy (Apr 2018) provides an excellent review and perspective on this technology.

Significance. Understanding basic mechanisms and engineering new systems in biology are hampered by the challenge of quantifying and manipulating numerous molecular components and their interactions. In this work, we present an approach to tackle these difficulties, by combining a cell-free system with a microfluidic platform for high-throughput measurements. We apply the system to comprehensively characterize a library of synthetic zinc-finger transcription factors, which are common building blocks of transcriptional regulatory networks. We subsequently use the knowledge gained to engineer highly specific, tunable, and strong cooperative repressors, which can be applied to carry out logical computation on promoters.

Abstract. Gene-regulatory networks are ubiquitous in nature and critical for bottom-up engineering of synthetic networks. Transcriptional repression is a fundamental function that can be tuned at the level of DNA, protein, and cooperative protein-protein interactions, necessitating high-throughput experimental approaches for in-depth characterization. Here, we used a cell-free system in combination with a high-throughput microfluidic device to comprehensively study the different tuning mechanisms of a synthetic zinc-finger repressor library, whose affinity and cooperativity can be rationally engineered. The device is integrated into a comprehensive workflow that includes determination of transcription-factor binding-energy landscapes and mechanistic modeling, enabling us to generate a library of well-characterized synthetic transcription factors and corresponding promoters, which we then used to build gene-regulatory networks de novo. The well-characterized synthetic parts and insights gained should be useful for rationally engineering gene-regulatory networks and for studying the biophysics of transcriptional regulation.”

Reinforcement learning (RL) can be applied in optimizing chemical/biochemical reactions. Optimizing Chemical Reactions with Deep Reinforcement Learning (Dec 2017) [code] showed that their RL model outperformed a state of the art algorithm, and generalized to dissimilar underlying mechanisms. Combined with LSTM to model the policy function, the RL agent optimized the chemical reaction with the Markov decision process (MDP) characterized by $\small \{S, A, P, R\}$, where $\small S$ was the set of experimental conditions (like temperature, pH, etc), $\small A$ was the set all possible actions that can change the experimental conditions, $\small P$ was the transition probability from current experiment condition to the next condition, and $\small R$ was the reward which is a function of the state.

• Their Deep Reaction Optimizer model iteratively recorded the results of a chemical reaction and chose new experimental conditions to improve the reaction outcome, outperforming a state of the art black box optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, they introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on “regret” from 0.062 to 0.039 compared with a deterministic policy (they used “regret” to evaluate the performance of the model ). For the optimization of real-world reactions, the authors carried out four experiments in microdroplets ( in their paper: synthesis of ribose phosphate, etc.) and recorded the production yield. Combining the efficient exploration policy with accelerated microdroplet reactions, their Deep Reaction Optimizer not only served as an efficient and effective reaction optimizer (optimal reaction conditions were determined in 30 min for the four reactions considered), it also provided a better understanding of the mechanism of chemical reactions than that obtained using traditional approaches.

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

In a revolutionary computational/in silico approach, Inferring Regulatory Networks from Experimental Morphological Phenotypes: A Computational Method Reverse-Engineers Planarian Regeneration (Jun 2015) [media: Planarian Regeneration Model Discovered by Artificial Intelligence] applied machine learning to uncover pathways associated with tissue regeneration. Planarian regeneration had been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model had been found by scientists that explained more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. Those authors presented a method that inferred the molecular products, topology, and spatial and temporal non-linear dynamics of regulatory networks – recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. They demonstrated their approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature. By analyzing all the datasets together, their system inferred the first systems-biology comprehensive dynamical model explaining patterning in planarian regeneration. This method provided an automated, highly generalizable framework for identifying the underlying control mechanisms responsible for the dynamic regulation of growth and form.

• “An artificial intelligence system has for the first time reverse-engineered the regeneration mechanism of planaria - the small worms whose extraordinary power to regrow body parts has made them a research model in human regenerative medicine. The discovery by Tufts University biologists presents the first model of regeneration discovered by a non-human intelligence and the first comprehensive model of planarian regeneration, which had eluded human scientists for over 100 years. …”

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

• See also subsequent similar work (different authors), Cell Type Atlas and Lineage Tree of a Whole Complex Animal by Single-Cell Transcriptomics (May 2018), which mapped the transcriptome for essentially all cell types a planarian (flatworm): dozens of cell types including stem cells, progenitors, and terminally differentiated cells. They then applied a new computational algorithm, partition-based graph abstraction (PAGA), which could predict a lineage tree for the whole animal in an unbiased way. Notably, their approach was applicable to other model and non-model organisms, assuming that their differentiation processes are sampled with sufficient time resolution.

[Image source. Click image to open in new window.]

• In turn, that data was used as an example of an approach that determined the intrinsic dimensionality of data. Although large-scale datasets are frequently high-dimensional, their data frequently possess structures that significantly decrease their intrinsic dimensionality (ID) due to the presence of clusters, points being located close to low-dimensional varieties or fine-grained lumping. Estimating the Effective Dimension of Large Biological Datasets Using Fisher Separability Analysis (Jan 2019) [code] tested a dimensionality estimator that was based on analysing the separability properties of data points, on several benchmarks and real biological datasets. They showed that the introduced measure of ID had performance competitive with state of the art measures, being efficient across a wide range of dimensions and performing better in the case of noisy samples. Moreover, it allowed estimating the intrinsic dimension in situations where the intrinsic manifold assumption was not valid. [Note their Fig. 5.]

[Image source. Click image to open in new window.]

The examples above hint at the potential advances afforded by machine learning in the biochemical/medical domain, which also include the prediction of biomolecular secondary structure [e.g. rawMSA: proper Deep Learning makes protein sequence profiles and feature extraction obsolete (Aug 2018)], biodesign (e.g of new anticancer drugs) and inverse molecular design (very well-reviewed in the July 2018 Science paper Inverse Molecular Design using Machine Learning: Generative Models for Matter Engineering), and numerous other applications. These approaches offer excellent opportunities for collaboration in the advancement of our understanding of metabolism, cellular signaling, regulatory mechanisms, and disease (including cancer).

## ODE: Ordinary Differential Equations

An ordinary differential equation (ODE) is a differential equation containing one or more functions of one independent variable and the derivatives of those functions. The term ordinary is used in contrast with the term partial differential equation [see my Partial Derivatives entry for a few examples] which may be with respect to more than one independent variable.

### Examples

Ordinary Differential Equations and Boolean Networks in Application to Modelling of 6-Mercaptopurine Metabolism (Apr 2017).  “We consider two approaches to modelling the cell metabolism of 6-mercaptopurine, one of the important chemotherapy drugs used for treating acute lymphocytic leukaemia: kinetic ordinary differential equations, and Boolean networks supplied with one controlling node, which takes continual values. We analyse their interplay with respect to taking into account ATP concentration as a key parameter of switching between different pathways. It is shown that the Boolean networks, which allow avoiding the complexity of general kinetic modelling, preserve the possibility of reproducing the principal switching mechanism.”

• “There is no supporting material or special data since all equations, parameters used and algorithm are presented in the main text of the work and could be reproduced by anyone. Thus, paper contains complete self-sufficient information and does not need any additional data files.”

• “In this work, we have analysed the dynamic behaviour of the metabolic pathways of 6-MP with a focus on revealing the key parameter that switches between the two principal ‘branches’, slow and fast. The results of simulations based on the system of ordinary equations indicate that ATP is the desired ‘key player’ in 6-MP metabolism. This conclusion is supported by a number of phenomenological observations presented in the modern biomedical literature and allows for quantitative clarification of the underlying processes.

“Based on the results of ODE modelling, we have reformulated the problems in terms of the hybrid Boolean network, which can be considered as a deterministic analogue of the probabilistic Boolean networks. This approach is much simpler in realization since it does not require the knowledge of multiple kinetic parameters but, at the same time, adequately reproduces the key details of the switching principal dynamic regimes as a choice between different possible pathways. Therefore, it can be scaled to a more detailed picture of metabolite interactions in future research of the studied process.”

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

Computational Modeling of the Metabolic States Regulated by the Kinase Akt  (Nov 2012).  “… We compared two metabolic states generated by the specific variation of the fluxes regulated by the activity of the PI3K/Akt/mTOR  pathway. One state represented the metabolism of a growing cancer cell characterized by aerobic glycolysis and cellular biosynthesis (condition H), while the other (condition L) represented the same metabolic network with a reduced glycolytic rate, a reduced lactic acid production, but a higher MPM, as reported in literature in relation to a lower activity of PI3K/Akt/mTOR  (DeBerardinis et al., 2008). Some steps of the metabolic network that link glycolysis and PPP, namely those catalyzed by the G6PDH and TKL enzymes, revealed their importance for the cancer metabolic state. Results from our model may provide insight and assist in the selection of drug targets in anticancer treatments. …”  |  “The model is available in BioModels database (Li et al., 2010a) with identifier MODEL1210150000.”

The PI3K/Akt/mTOR  pathway regulates central carbon metabolism.  [Summary](A) PI3K/Akt/mTOR pathway. Signaling through the PI3K/Akt/mTOR pathway begins with the activation of RTKs in response to growth factors, leading to auto-phosphorylation on tyrosine residues and trans-phosphorylation of adaptor proteins. The PI3K is responsible for the production of 3-phosphoinositide lipid second messengers, including PIP3, which contributes to the activation of many downstream targets, such as PDK1 and mTORC2. Both PDK1 and mTORC2 activate, through phosphorylation in different sites, the serine-threonine protein kinase Akt. Akt regulates multiple functions including cellular metabolism, by promoting cell growth and proliferation through the activation of mTORC1, which also enhances the transcriptional activity of HIF-1α. Dashed lines represent the negative regulation of the PI3K/Akt/mTOR pathway by the action of mTORC1 feedback mechanism.
(B) The metabolic network with the main reactions of glucose metabolism. The main pathways involved in the glucose metabolism are considered: glycolysis, PPP, the glycogen synthesis and degradation, lactate, and MPM branches. The metabolic targets regulated by PI3K/Akt/mTOR pathway are represented on the network: the PI3K/Akt/mTOR direct regulation is presented in yellow; the PI3K/Akt/mTOR indirect regulation (via HIF-1α) is presented in pink; the PI3K/Akt/mTOR direct and indirect regulation is presented in orange. All the PI3K/Akt/mTOR direct and indirect targets considered here are positively regulated, with the only exception of the MPM. Allosteric regulators (modifiers), activators (+), or inhibitors (-), are depicted in red.  [ ... snip ... ]
[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

... First part of extended Appendix shown ...  uide[Image source. Click image to open in new window.]

Metabolic Flux Analysis and Metabolic Engineering of Microorganisms (Feb 2008).  “Recent advances in metabolic flux analysis including genome-scale constraints-based flux analysis and its applications in metabolic engineering are reviewed. Various computational aspects of constraints-based flux analysis including genome-scale stoichiometric models, additional constraints used for the improved accuracy, and several algorithms for identifying the target genes to be manipulated are described. Also, some of the successful applications of metabolic flux analysis in metabolic engineering are reviewed. Finally, we discuss the limitations that need to be overcome to make the results of genome-scale flux analysis more realistically represent the real cell metabolism.”

[Image source. Click image to open in new window.]

Metabolic stoichiometric matrix (amenable to linear algebra calculations).  [Image source. Click image to open in new window.]

Systems Approaches to Modelling Pathways and Networks (Sep 2011).  “It has become commonly accepted that systems approaches to biology are of outstanding importance to gain understanding from the vast amount of data which is presently being generated by advancing high-throughput technologies. The diversity of methods to model pathways and networks has significantly expanded over the past two decades. Modern and traditional approaches are equally important and recent activities aim at integrating the advantages of both. While traditional methods, based on differential equations, are useful to study the dynamics of small systems, modern constraint-based models can be applied to genome-scale systems, but are not able to capture dynamic features. Integrating different approaches is important to develop consistent theoretical descriptions encompassing various scales of biological information. The rapid progress of the field of theoretical systems biology, however, demonstrates how our fundamental theoretical understanding of biology is gaining momentum. The scientific community has apparently accepted the challenge to truly understand the principles of life.”

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

Neural Ordinary Differential Equations (University of Toronto; Vector Institute: Jun 2018, updated Jan 2019;  Best Paper award at NeurIPS 2018]  codeslides;  discussion (reddit) here and herediscussion (The Morning Paper); discussion (Hacker News)]

“We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a black-box differential equation solver. These continuous-depth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuous-depth residual networks and continuous-time latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows end-to-end training of ODEs within larger models.”

[Image source. Click image to open in new window.]

Example result obtained with Continuous Normalizing Flows (two moons problem from paper).  [Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]

ODE: Ordinary Differential Equations:

Slides:

## FBA: Flux Balance Analysis

Flux balance analysis (FBA) is a mathematical method for simulating metabolism in genome-scale reconstructions of metabolic networks. In comparison to traditional methods of modeling, FBA is less intensive in terms of the input data required for constructing the model. Simulations performed using FBA are computationally inexpensive and can calculate steady-state metabolic fluxes for large models (over 2000 reactions) in a few seconds on modern personal computers.

FBA finds applications in bioprocess engineering to systematically identify modifications to the metabolic networks of microbes used in fermentation processes that improve product yields of industrially important chemicals such as ethanol and succinic acid. It has also been used for the identification of putative drug targets in cancer and pathogens, rational design of culture media, and more recently host-pathogen interactions. The results of FBA can be visualized using flux maps similar to the image on the right, which illustrates the steady-state fluxes carried by reactions in glycolysis. The thickness of the arrows is proportional to the flux through the reaction.

FBA formalizes the system of equations describing the concentration changes in a metabolic network as the dot product of a matrix of the stoichiometric coefficients (the stoichiometric matrix S) and the vector v of the unsolved fluxes. The right-hand side of the dot product is a vector of zeros representing the system at steady state. Linear programming is then used to calculate a solution of fluxes corresponding to the steady state.

[Image source. Click image to open in new window.]

From Slide 13 in Flux Balance Analysis:

• One of the techniques used to analyse the complete metabolic genotype of a microbial strain is flux balance analysis (FBA):

• relies on balancing metabolic fluxes
• is based on the fundamental law of mass conservation
• is performed under steady-state conditions (an example of constraint …)
1. the stoichiometric of metabolic pathways,
2. metabolic demands,
3. and a few strain specific parameters
• it does NOT require enzymatic kinetic data

FBA: Flux Balance Analysis:

## Synthetic Biology

• Significance.   By enabling rational programming of mammalian cell behavior, synthetic biology is driving innovation across biomedical applications. Using Cas9-variants as core as protein-based central processing units (CPUs) that control gene expression in response to single-guide RNAs as genetic software, we have programmed scalable Boolean logic computations such as the half adder in single human cells. Combining orthogonal Cas9-variants enabled the design of multicore genetic CPUs that provide parallel arithmetic computations. The Cas9-based multicore CPU design may provide opportunities in single-cell mammalian biocomputing to provide biomedical applications.”

• Abstract.  Controlling gene expression with sophisticated logic gates has been and remains one of the central aims of synthetic biology. However, conventional implementations of biocomputers use central processing units (CPUs) assembled from multiple protein-based gene switches, limiting the programming flexibility and complexity that can be achieved within single cells. Here, we introduce a CRISPR/Cas9-based core processor that enables different sets of user-defined guide RNA inputs to program a single transcriptional regulator (dCas9-KRAB) to perform a wide range of bitwise computations, from simple Boolean logic gates to arithmetic operations such as the half adder. Furthermore, we built a dual-core CPU combining two orthogonal core processors in a single cell. In principle, human cells integrating multiple orthogonal CRISPR/Cas9-based core processors could offer enormous computational capacity.”

• “Synthetic gene circuits emerge from iterative design-build-test cycles. Most commonly, the time-limiting step is the circuit construction process. Here, we present a hierarchical cloning scheme based on the widespread Gibson assembly method [see also] and make the set of constructed plasmids freely available. Our two-step modular cloning scheme allows for simple, fast, efficient and accurate assembly of gene circuits and combinatorial circuit libraries in Escherichia coli. The first step involves Gibson assembly of transcriptional units from constituent parts into individual intermediate plasmids. In the second step, these plasmids are digested with specific sets of restriction enzymes. The resulting flanking regions have overlaps that drive a second Gibson assembly into a single plasmid to yield the final circuit. This approach substantially reduces time and sequencing costs associated with gene circuit construction and allows for modular and combinatorial assembly of circuits. We demonstrate the usefulness of our framework by assembling a double-inverter circuit and a combinatorial library of 3-node networks.”

[Image source. Click image to open in new window.]

[Image source. Click image to open in new window.]