[Landing page]

Technical Review

Biomedical Knowledge Discovery in Networks Through Language/Graphical Models and Machine Learning

Last modified: 2018-11-26

Copyright © 2018-present, Victoria A. Stuart

Please cite this work as:

    Stuart, Victoria A. (August 2018) "Biomedical Knowledge Discovery in Networks Through Language/Graphical Models and Machine Learning" https://persagen.com/resources/biokdd-review.html.   [BibTex]


The past several years have seen stunning advances in machine learning (ML) and natural language processing (NLP). In this TECHNICAL REVIEW I survey leading approaches to ML and NLP applicable to the biomedical domain, with a particular focus on:

  • construction of structured knowledge stores (commonsense knowledge):

    • textual knowledge stores (ideal for storing reference material)

    • knowledge graphs (ideal for storing relational data)

  • natural language understanding:

    • word embedding (applicable to representation learning, relation extraction/link prediction, language understanding, and knowledge discovery):

    • natural language models (to better understand and leverage natural language and text);

    • natural language inference (recognizing textual entailment: identifying the relationship between a premise and a hypothesis)

    • reading comprehension (ability to process text, understand its meaning, and integrate it with preexisting knowledge);

    • commonsense reasoning (ability to make presumptions about the type and essence of ordinary situations);

    • question answering and recommendation (addressing user inquiry);

  • information overload (including text classification and summarization);

  • transfer learning and multi-task learning (leveraging existing knowledge and models for new tasks);

  • explainable/interpretable models (rendering machine learning decisions/output transparent to human understanding);

  • in silico modeling (using computer models to model biochemical, biomolecular, pharmacological and physiological processes).

Particular attention will be placed on advances in NLP and ML that are applicable to biomolecular and biomedical studies, and clinical science.


  • These contents are presented as summary notes, not a research paper: my intent here is to summarize the recent literature relevant to the subject areas indicated in the Table of Contents. While this REVIEW is comprehensive, it is not an exhaustive survey of that literature, as it reflects my personal interests.

  • The terms “machine learning” and “neural networks” are used interchangeably.

  • For convenience I will often repeat key abbreviation definitions in various subsections. Also, I generally do not pluralize abbreviations (e.g., RNN, not RNNs ) unless not doing so leads to ambiguity.

  • I will paraphrase references inline with relevant URLs provided, generally forgoing the use of author names etc. but occasionally mentioning key individuals and dates.

  • Internal references to other parts of this REVIEW (and my Glossary) are presented as green hyperlinks.

  • Less frequently you will encounter orange-brown hyperlinks, which are “mouseover” images (supplementary content that I think is important, but I do not want to have prominently displayed unless you move your cursor over that link). Example.

  • Please refer to my Glossary (and this glossary) for descriptions of some of the terms and concepts used in this REVIEW.