Technical Review
Last modified: 20190419
Copyright notice, citation: Copyright
Source © 2018present, Victoria A. Stuart
These Contents
IN SILICO MODELING
In silico Modeling
In silico modeling – the computational modeling of biochemical, metabolic, pharmacologic or physiologic processes – is a logical extension of in vitro experimentation [In silico Modelling of Physiologic Systems (Dec 2011)]. A natural result of the explosive increase in computing power available to research scientists at continually decreasing cost, in silico modeling combines the advantages of in vivo and in vitro experimentation, not subject to ethical considerations or the lack of control associated with many in vivo experiments (e.g. human or animal experimentation). In silico models also allow researchers to include a virtually unlimited array of parameters, potentially rendering the results more applicable to the organism as a whole.
Examples of recent work in the in silico domain include In vivo and In silico Dynamics of the Development of Metabolic Syndrome (Jun 2018) [code]
[Image source. Click image to open in new window.]… and Systems Modelling of the EGFRPYK2cMet Interaction Network Predicts and Prioritizes Synergistic Drug Combinations for TripleNegative Breast Cancer (Jun 2018) [media].
[Image source. Click image to open in new window.]The University of Connecticut’s “Virtual Cell,” Compartmental and Spatial RuleBased Modeling with Virtual Cell (Oct 2017) [project pages here and here; code; community; media]) provides a comprehensive platform for modeling and simulation of cell biology from biological pathways down to cell biophysics. VCell supports biochemical network and rulebased modeling and electrophysiology in compartmental modeling and within cellular geometry.
[Image source. Click image to open in new window.]One might inquiry whether knowledge graphs, which naturally embed biochemical pathways and networks, are amenable to silico perturbation. The Cellular Potts Model (CPM ) is a computational biological model of the collective behavior of cellular structures. CPM allows modeling of many phenomena such as cell migration, clustering, and growth taking adhesive forces – taking environment sensing as well as volume and surface area constraints into account. In silico Modeling for Tumor Growth Visualization (Aug 2016) implemented a crude graphical model via a CPM, that can be visualized in Cytoscape via cpmcytoscape [project]. Those authors also described their work in Machine Learning for In Silico Modeling of Tumor Growth.
[Image source. Click image to open in new window.]See also:
[Image source. Click image to open in new window.]
 Cellular Potts Modeling of Tumor Growth, Tumor Invasion, and Tumor Evolution (Apr 2013)
 Cellular Potts Modeling of Complex Multicellular Behaviors in Tissue Morphogenesis (Jun 2017)
[Image source. Click image to open in new window.]
In the synthetic biology domain, OutofEquilibrium Microcompartments for the BottomUp Integration of Metabolic Functions (Jun 2018) [media] recently described the analysis of selfsustained metabolic pathways in a microfluidic platform, waterinoil droplet microcompartments. The authors developed an assay based on nicotinamide adenine dinucleotide (NADH) fluorescence to quantify the metabolic state of the microcompartments. The minimal metabolism was constructed from a reaction converting glucose6phosphate into 6phosphogluconolactone. The reaction was catalysed by glucose6phosphate dehydrogenase, an enzyme involved in the pentose phosphate pathway. A key feature integrated the ability to function under conditions where the reaction was sustained independently of the cofactor stoichiometry. The full conversion of the metabolic substrate required the regeneration of the cofactor NAD+: a regeneration module made of inverted membrane vesicles extracted from E. coli.
[Image source. Click image to open in new window.]The metabolic pathways exemplified in that microfluidic approach (above) are easily modeled in a knowledge graph, suggesting the possibility of the in silico modeling of pathway reactions in parallel with “wet lab,” systems biology approaches. Other relatively easilyconstructed models (e.g.: bioengineered microbes and viruses on microbiological media; transgenic rodents; etc.) could likewise be modeled in a combined systems biology, in silico modeling approach. In this regard, clustered regularly interspaced short palindromic repeats/CRISPR associated protein 9 (CRISPR/Cas9) gene editing technology is especially relevant; for example, note Section 3 in CRISPR/Cas9Based Genome Editing for Disease Modeling and Therapy: Challenges and Opportunities for Nonviral Delivery (Jun 2017), and The Present and Future of Genome Editing in Cancer Research (Jul 2017). CRISPR/Cas9 for Cancer Research and Therapy (Apr 2018) provides an excellent review and perspective on this technology.
CellFree GeneRegulatory Network Engineering with Synthetic Transcription Factors (Mar 2019) [data & code here and here; media (ScienceDaily.com)]
“Significance. Understanding basic mechanisms and engineering new systems in biology are hampered by the challenge of quantifying and manipulating numerous molecular components and their interactions. In this work, we present an approach to tackle these difficulties, by combining a cellfree system with a microfluidic platform for highthroughput measurements. We apply the system to comprehensively characterize a library of synthetic zincfinger transcription factors, which are common building blocks of transcriptional regulatory networks. We subsequently use the knowledge gained to engineer highly specific, tunable, and strong cooperative repressors, which can be applied to carry out logical computation on promoters.
“Abstract. Generegulatory networks are ubiquitous in nature and critical for bottomup engineering of synthetic networks. Transcriptional repression is a fundamental function that can be tuned at the level of DNA, protein, and cooperative proteinprotein interactions, necessitating highthroughput experimental approaches for indepth characterization. Here, we used a cellfree system in combination with a highthroughput microfluidic device to comprehensively study the different tuning mechanisms of a synthetic zincfinger repressor library, whose affinity and cooperativity can be rationally engineered. The device is integrated into a comprehensive workflow that includes determination of transcriptionfactor bindingenergy landscapes and mechanistic modeling, enabling us to generate a library of wellcharacterized synthetic transcription factors and corresponding promoters, which we then used to build generegulatory networks de novo. The wellcharacterized synthetic parts and insights gained should be useful for rationally engineering generegulatory networks and for studying the biophysics of transcriptional regulation.”
Reinforcement learning (RL) can be applied in optimizing chemical/biochemical reactions. Optimizing Chemical Reactions with Deep Reinforcement Learning (Dec 2017) [code] showed that their RL model outperformed a state of the art algorithm, and generalized to dissimilar underlying mechanisms. Combined with LSTM to model the policy function, the RL agent optimized the chemical reaction with the Markov decision process (MDP) characterized by $\small \{S, A, P, R\}$, where $\small S$ was the set of experimental conditions (like temperature, pH, etc), $\small A$ was the set all possible actions that can change the experimental conditions, $\small P$ was the transition probability from current experiment condition to the next condition, and $\small R$ was the reward which is a function of the state.

Their Deep Reaction Optimizer model iteratively recorded the results of a chemical reaction and chose new experimental conditions to improve the reaction outcome, outperforming a state of the art black box optimization algorithm by using 71% fewer steps on both simulations and real reactions. Furthermore, they introduced an efficient exploration strategy by drawing the reaction conditions from certain probability distributions, which resulted in an improvement on “regret” from 0.062 to 0.039 compared with a deterministic policy (they used “regret” to evaluate the performance of the model ). For the optimization of realworld reactions, the authors carried out four experiments in microdroplets (Scheme 2
[Image source. Click image to open in new window.]
Source in their paper: synthesis of ribose phosphate, etc.) and recorded the production yield. Combining the efficient exploration policy with accelerated microdroplet reactions, their Deep Reaction Optimizer not only served as an efficient and effective reaction optimizer (optimal reaction conditions were determined in 30 min for the four reactions considered), it also provided a better understanding of the mechanism of chemical reactions than that obtained using traditional approaches.
[Image source. Click image to open in new window.]
In a revolutionary computational/in silico approach, Inferring Regulatory Networks from Experimental Morphological Phenotypes: A Computational Method ReverseEngineers Planarian Regeneration (Jun 2015) [media: Planarian Regeneration Model Discovered by Artificial Intelligence] applied machine learning to uncover pathways associated with tissue regeneration. Planarian regeneration had been studied for over a century, but despite increasing insight into the pathways that control its stem cells, no constructive, mechanistic model had been found by scientists that explained more than one or two key features of its remarkable ability to regenerate its correct anatomical pattern after drastic perturbations. Those authors presented a method that inferred the molecular products, topology, and spatial and temporal nonlinear dynamics of regulatory networks – recapitulating in silico the rich dataset of morphological phenotypes resulting from genetic, surgical, and pharmacological experiments. They demonstrated their approach by inferring complete regulatory networks explaining the outcomes of the main functional regeneration experiments in the planarian literature. By analyzing all the datasets together, their system inferred the first systemsbiology comprehensive dynamical model explaining patterning in planarian regeneration. This method provided an automated, highly generalizable framework for identifying the underlying control mechanisms responsible for the dynamic regulation of growth and form.

“An artificial intelligence system has for the first time reverseengineered the regeneration mechanism of planaria  the small worms whose extraordinary power to regrow body parts has made them a research model in human regenerative medicine. The discovery by Tufts University biologists presents the first model of regeneration discovered by a nonhuman intelligence and the first comprehensive model of planarian regeneration, which had eluded human scientists for over 100 years. …”
[Image source. Click image to open in new window.]
[Image source. Click image to open in new window.]
See also subsequent similar work (different authors), Cell Type Atlas and Lineage Tree of a Whole Complex Animal by SingleCell Transcriptomics (May 2018), which mapped the transcriptome for essentially all cell types a planarian (flatworm): dozens of cell types including stem cells, progenitors, and terminally differentiated cells. They then applied a new computational algorithm, partitionbased graph abstraction (PAGA), which could predict a lineage tree for the whole animal in an unbiased way. Notably, their approach was applicable to other model and nonmodel organisms, assuming that their differentiation processes are sampled with sufficient time resolution.
[Image source. Click image to open in new window.] 
In turn, that data was used as an example of an approach that determined the intrinsic dimensionality of data. Although largescale datasets are frequently highdimensional, their data frequently possess structures that significantly decrease their intrinsic dimensionality (ID) due to the presence of clusters, points being located close to lowdimensional varieties or finegrained lumping. Estimating the Effective Dimension of Large Biological Datasets Using Fisher Separability Analysis (Jan 2019) [code] tested a dimensionality estimator that was based on analysing the separability properties of data points, on several benchmarks and real biological datasets. They showed that the introduced measure of ID had performance competitive with state of the art measures, being efficient across a wide range of dimensions and performing better in the case of noisy samples. Moreover, it allowed estimating the intrinsic dimension in situations where the intrinsic manifold assumption was not valid. [Note their Fig. 5.]
[Image source. Click image to open in new window.]

The examples above hint at the potential advances afforded by machine learning in the biochemical/medical domain, which also include the prediction of biomolecular secondary structure [e.g. rawMSA: proper Deep Learning makes protein sequence profiles and feature extraction obsolete (Aug 2018)], biodesign (e.g of new anticancer drugs) and inverse molecular design (very wellreviewed in the July 2018 Science paper Inverse Molecular Design using Machine Learning: Generative Models for Matter Engineering), and numerous other applications. These approaches offer excellent opportunities for collaboration in the advancement of our understanding of metabolism, cellular signaling, regulatory mechanisms, and disease (including cancer).
ODE: Ordinary Differential Equations
An ordinary differential equation (ODE) is a differential equation containing one or more functions of one independent variable and the derivatives of those functions. The term ordinary is used in contrast with the term partial differential equation [see my Partial Derivatives entry for a few examples] which may be with respect to more than one independent variable.
Background
Examples
Ordinary Differential Equations and Boolean Networks in Application to Modelling of 6Mercaptopurine Metabolism (Apr 2017). “We consider two approaches to modelling the cell metabolism of 6mercaptopurine, one of the important chemotherapy drugs used for treating acute lymphocytic leukaemia: kinetic ordinary differential equations, and Boolean networks supplied with one controlling node, which takes continual values. We analyse their interplay with respect to taking into account ATP concentration as a key parameter of switching between different pathways. It is shown that the Boolean networks, which allow avoiding the complexity of general kinetic modelling, preserve the possibility of reproducing the principal switching mechanism.”

“There is no supporting material or special data since all equations, parameters used and algorithm are presented in the main text of the work and could be reproduced by anyone. Thus, paper contains complete selfsufficient information and does not need any additional data files.”

“In this work, we have analysed the dynamic behaviour of the metabolic pathways of 6MP with a focus on revealing the key parameter that switches between the two principal ‘branches’, slow and fast. The results of simulations based on the system of ordinary equations indicate that ATP is the desired ‘key player’ in 6MP metabolism. This conclusion is supported by a number of phenomenological observations presented in the modern biomedical literature and allows for quantitative clarification of the underlying processes.
“Based on the results of ODE modelling, we have reformulated the problems in terms of the hybrid Boolean network, which can be considered as a deterministic analogue of the probabilistic Boolean networks. This approach is much simpler in realization since it does not require the knowledge of multiple kinetic parameters but, at the same time, adequately reproduces the key details of the switching principal dynamic regimes as a choice between different possible pathways. Therefore, it can be scaled to a more detailed picture of metabolite interactions in future research of the studied process.”
Computational Modeling of the Metabolic States Regulated by the Kinase Akt (Nov 2012). “… We compared two metabolic states generated by the specific variation of the fluxes regulated by the activity of the PI3K/Akt/mTOR pathway. One state represented the metabolism of a growing cancer cell characterized by aerobic glycolysis and cellular biosynthesis (condition H), while the other (condition L) represented the same metabolic network with a reduced glycolytic rate, a reduced lactic acid production, but a higher MPM, as reported in literature in relation to a lower activity of PI3K/Akt/mTOR (DeBerardinis et al., 2008). Some steps of the metabolic network that link glycolysis and PPP, namely those catalyzed by the G6PDH and TKL enzymes, revealed their importance for the cancer metabolic state. Results from our model may provide insight and assist in the selection of drug targets in anticancer treatments. …”  “The model is available in BioModels database (Li et al., 2010a) with identifier MODEL1210150000.”
The PI3K/Akt/mTOR pathway regulates central carbon metabolism. [Summary](A) PI3K/Akt/mTOR pathway. Signaling through the PI3K/Akt/mTOR pathway begins with the activation of RTKs in response to growth factors, leading to autophosphorylation on tyrosine residues and transphosphorylation of adaptor proteins. The PI3K is responsible for the production of 3phosphoinositide lipid second messengers, including PIP3, which contributes to the activation of many downstream targets, such as PDK1 and mTORC2. Both PDK1 and mTORC2 activate, through phosphorylation in different sites, the serinethreonine protein kinase Akt. Akt regulates multiple functions including cellular metabolism, by promoting cell growth and proliferation through the activation of mTORC1, which also enhances the transcriptional activity of HIF1α. Dashed lines represent the negative regulation of the PI3K/Akt/mTOR pathway by the action of mTORC1 feedback mechanism.(B) The metabolic network with the main reactions of glucose metabolism. The main pathways involved in the glucose metabolism are considered: glycolysis, PPP, the glycogen synthesis and degradation, lactate, and MPM branches. The metabolic targets regulated by PI3K/Akt/mTOR pathway are represented on the network: the PI3K/Akt/mTOR direct regulation is presented in yellow; the PI3K/Akt/mTOR indirect regulation (via HIF1α) is presented in pink; the PI3K/Akt/mTOR direct and indirect regulation is presented in orange. All the PI3K/Akt/mTOR direct and indirect targets considered here are positively regulated, with the only exception of the MPM. Allosteric regulators (modifiers), activators (+), or inhibitors (), are depicted in red. [ ... snip ... ] [Image source. Click image to open in new window.]
Metabolic Flux Analysis and Metabolic Engineering of Microorganisms (Feb 2008). “Recent advances in metabolic flux analysis including genomescale constraintsbased flux analysis and its applications in metabolic engineering are reviewed. Various computational aspects of constraintsbased flux analysis including genomescale stoichiometric models, additional constraints used for the improved accuracy, and several algorithms for identifying the target genes to be manipulated are described. Also, some of the successful applications of metabolic flux analysis in metabolic engineering are reviewed. Finally, we discuss the limitations that need to be overcome to make the results of genomescale flux analysis more realistically represent the real cell metabolism.”
[Image source. Click image to open in new window.]Systems Approaches to Modelling Pathways and Networks (Sep 2011). “It has become commonly accepted that systems approaches to biology are of outstanding importance to gain understanding from the vast amount of data which is presently being generated by advancing highthroughput technologies. The diversity of methods to model pathways and networks has significantly expanded over the past two decades. Modern and traditional approaches are equally important and recent activities aim at integrating the advantages of both. While traditional methods, based on differential equations, are useful to study the dynamics of small systems, modern constraintbased models can be applied to genomescale systems, but are not able to capture dynamic features. Integrating different approaches is important to develop consistent theoretical descriptions encompassing various scales of biological information. The rapid progress of the field of theoretical systems biology, however, demonstrates how our fundamental theoretical understanding of biology is gaining momentum. The scientific community has apparently accepted the challenge to truly understand the principles of life.”
[Image source. Click image to open in new window.]Neural Ordinary Differential Equations (University of Toronto; Vector Institute: Jun 2018, updated Jan 2019; Best Paper award at NeurIPS 2018] code; slides; discussion (reddit) here and here; discussion (The Morning Paper); discussion (Hacker News)]
[Image source. Click image to open in new window.]“We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. The output of the network is computed using a blackbox differential equation solver. These continuousdepth models have constant memory cost, adapt their evaluation strategy to each input, and can explicitly trade numerical precision for speed. We demonstrate these properties in continuousdepth residual networks and continuoustime latent variable models. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions. For training, we show how to scalably backpropagate through any ODE solver, without access to its internal operations. This allows endtoend training of ODEs within larger models.”
 Followon paper: FFJORD: Freeform Continuous Dynamics for Scalable Reversible Generative Models (Oct 2018)
ODE: Ordinary Differential Equations:
Additional Reading
 EnergyBased Analysis of Biomolecular Pathways [local copy]
 Basic Concepts and Principles of Stoichiometric Modeling of Metabolic Networks (Sep 2013)

The Underlying Pathway Structure of Biochemical Reaction Networks (Apr 1998)

[StackExchange: Chemistry] Are Matrices a Good Way to Do Stoichiometry?
 [GNU R] Chemical Reaction Engineering [ [local copy]
 [GNU Octave] Teaching Chemical Equilibria Using Open Source Software OCTAVE (2015)
 [Metatool] Metatool 5.0: Fast and Flexible Elementary Modes Analysis [local copy]
 METATOOL 5.0 for GNU octave and MATLAB [local copy  scripts (gz archive)  Metatool 5.1 for Octave 3.0+ (gz archive)]
 Sample Metabolic Networks and Input Files for Metatool [local copy  sample file (dat file)  E. coli input files (gz file)]
Slides:
 Stoichiometric Network Analysis.pdf [local copy]
 The Stoichiometry of Reactions – Introduction [local copy]
 Structural Network Analysis [local copy]
 Model Checking for ODE Models [local copy]
 Stoichiometric Vector and Matrix [local copy]
 Stoichiometry Matrix
FBA: Flux Balance Analysis
Flux balance analysis (FBA) is a mathematical method for simulating metabolism in genomescale reconstructions of metabolic networks. In comparison to traditional methods of modeling, FBA is less intensive in terms of the input data required for constructing the model. Simulations performed using FBA are computationally inexpensive and can calculate steadystate metabolic fluxes for large models (over 2000 reactions) in a few seconds on modern personal computers.
FBA finds applications in bioprocess engineering to systematically identify modifications to the metabolic networks of microbes used in fermentation processes that improve product yields of industrially important chemicals such as ethanol and succinic acid. It has also been used for the identification of putative drug targets in cancer and pathogens, rational design of culture media, and more recently hostpathogen interactions. The results of FBA can be visualized using flux maps similar to the image on the right, which illustrates the steadystate fluxes carried by reactions in glycolysis. The thickness of the arrows is proportional to the flux through the reaction.
FBA formalizes the system of equations describing the concentration changes in a metabolic network as the dot product of a matrix of the stoichiometric coefficients (the stoichiometric matrix S) and the vector v of the unsolved fluxes. The righthand side of the dot product is a vector of zeros representing the system at steady state. Linear programming is then used to calculate a solution of fluxes corresponding to the steady state.
[Image source. Click image to open in new window.]From Slide 13 in Flux Balance Analysis:

One of the techniques used to analyse the complete metabolic genotype of a microbial strain is flux balance analysis (FBA):
 relies on balancing metabolic fluxes
 is based on the fundamental law of mass conservation
 is performed under steadystate conditions (an example of constraint …)
 requires information only about:
 the stoichiometric of metabolic pathways,
 metabolic demands,
 and a few strain specific parameters
 the stoichiometric of metabolic pathways,
 it does NOT require enzymatic kinetic data
FBA: Flux Balance Analysis:
Additional Reading
 What is flux balance analysis? (Mar 2010)

Klipp, E (2005) Metabolism [Ch. 5 in: Systems Biology in Practice: Concepts, Implementation and Application]
 Software Applications Toward Quantitative Metabolic Flux Analysis and Modeling (Nov 2012) [local copy]

[GitHub] GEMsplice_code: Matlab/Octave code of GEMsplice – Integrating SpliceIsoform Expression into GenomeScale Models Characterizes Breast Cancer Metabolism (Feb 2012) [local copy]
[Image source. Click image to open in new window.]  [slides] Flux Balance Analysis [local copy]
Synthetic Biology
A CRISPR/Cas9Based Central Processing Unit to Program Complex Logic Computation in Human Cells (Mar 2019) [discussion (ScienceDaily.com)]

“Significance. By enabling rational programming of mammalian cell behavior, synthetic biology is driving innovation across biomedical applications. Using Cas9variants as core as proteinbased central processing units (CPUs) that control gene expression in response to singleguide RNAs as genetic software, we have programmed scalable Boolean logic computations such as the half adder in single human cells. Combining orthogonal Cas9variants enabled the design of multicore genetic CPUs that provide parallel arithmetic computations. The Cas9based multicore CPU design may provide opportunities in singlecell mammalian biocomputing to provide biomedical applications.”

“Abstract. Controlling gene expression with sophisticated logic gates has been and remains one of the central aims of synthetic biology. However, conventional implementations of biocomputers use central processing units (CPUs) assembled from multiple proteinbased gene switches, limiting the programming flexibility and complexity that can be achieved within single cells. Here, we introduce a CRISPR/Cas9based core processor that enables different sets of userdefined guide RNA inputs to program a single transcriptional regulator (dCas9KRAB) to perform a wide range of bitwise computations, from simple Boolean logic gates to arithmetic operations such as the half adder. Furthermore, we built a dualcore CPU combining two orthogonal core processors in a single cell. In principle, human cells integrating multiple orthogonal CRISPR/Cas9based core processors could offer enormous computational capacity.”
A Framework for the Modular and Combinatorial Assembly of Synthetic Gene Circuits (Apr 2019)

“Synthetic gene circuits emerge from iterative designbuildtest cycles. Most commonly, the timelimiting step is the circuit construction process. Here, we present a hierarchical cloning scheme based on the widespread Gibson assembly method [see also] and make the set of constructed plasmids freely available. Our twostep modular cloning scheme allows for simple, fast, efficient and accurate assembly of gene circuits and combinatorial circuit libraries in Escherichia coli. The first step involves Gibson assembly of transcriptional units from constituent parts into individual intermediate plasmids. In the second step, these plasmids are digested with specific sets of restriction enzymes. The resulting flanking regions have overlaps that drive a second Gibson assembly into a single plasmid to yield the final circuit. This approach substantially reduces time and sequencing costs associated with gene circuit construction and allows for modular and combinatorial assembly of circuits. We demonstrate the usefulness of our framework by assembling a doubleinverter circuit and a combinatorial library of 3node networks.”
[Image source. Click image to open in new window.]
[Image source. Click image to open in new window.]