Technical Review

Explainable (Interpretable) Models

Last modified: 2019-03-11

Copyright notice, citation: Copyright
© 2018-present, Victoria A. Stuart

These Contents

[Table of Contents]


We need to be able to trust and understand how results are generated in NLP and ML based models. Recent, relevant discussions include:

The utility of knowledge graphs is outlined in my Why Graphs? subsection. The ability to model heterogeneous information networks (HINs) – complex networks widely represented in biological domains – may encapsulate higher-order interactions that crucially reflect the complex nature among nodes and edges in real world data. At the same time, network motifs may reveal higher order interactions and network semantics in homogeneous networks.

Augmenting KG with external resources such as textual knowledge stores and ontologies is an obvious approach to providing explanatory material to KG. This is particularly important when machine learning approaches are employed in the biomedical and clinical sciences domains, where high precision is imperative for supporting, not distracting practitioners (What Do We Need to Build Explainable AI Systems for the Medical Domain?), and it is crucial to underpin machine output with reasons that are verifiable by humans.

Paraphrased from What Do We Need to Build Explainable AI Systems for the Medical Domain?” (Dec 2017):

  • “The only way forward seems to be the integration of both knowledge-based and neural approaches to combine the interpretability of the former with the high efficiency of the latter. To this end, there have been attempts to retrofit neural embeddings with information from knowledge bases as well as to project embedding dimensions onto interpretable low-dimensional sub-spaces. More promising, in our opinion, is the use of hybrid distributional models that combine sparse graph-based representations with dense vector representations and link them to lexical resources and knowledge bases.

    “Here a hybrid human-in-the-loop approach can be beneficial, where not only the machine learning models for knowledge extraction are supported and improved over time, the final entity graph becomes larger, cleaner, more precise and thus more usable for domain experts. Contrary to classical automatic machine learning, human-in-the-loop approaches do not operate on predefined training or test sets, but assume that human expert input regarding system improvement is supplied iteratively.”

Regarding the “hybrid human-in-the-loop approach” mentioned in What Do We Need to Build Explainable AI Systems for the Medical Domain?, the Never-Ending Language Learner [NELLproject] occasionally interacts with human trainers, mostly for negative feedback identifying NELL’s incorrect beliefs (see Section 7 and Figs. 6 & 7
in that paper), though this human-machine interaction has been decreasing as NELL gets more accurate.

Knowledge graph models attain state of the art accuracy in knowledge base completion, but their predictions are notoriously hard to interpret. The OpenKE project  [code] is an open-source framework for knowledge graph embedding (KGE) based on the TensorFlow machine learning platform. A very interesting, heavily modified fork of the OpenKE  GitHub repository is XKE  (XKE : eXplaining Knowledge Embedding models), which contains implementations of XKE-PRED and XKE-TRUE described in Interpreting Embedding Models of Knowledge Bases: A Pedagogical Approach (ICML 2018). XKE adapts “pedagogical approaches” to interpret embedding models, by extracting weighted Horn rules from them.

  • In mathematical logic and logic programming, a Horn clause is a logical formula of a particular rule-like form which gives it useful properties for use in logic programming, formal specification, and model theory.

  • A pedagogical approach is one where – intuitively speaking – a non-interpretable but accurate model is run, and an interpretable model is learned from the output of the non-interpretable one.

  • OpenKE was used in Knowledge Representation Learning: A Quantitative Review (Dec 2018).

State of the art in knowledge base completion typically relies on embedding models that map entities and relations into low-dimensional vector space. The existence of a relation triple is determined by some pre-defined function over these representations. More importantly, embedding models turn complex space of semantic concepts into a smooth space where gradients can be calculated and followed. One difficulty with embeddings is their poor interpretability viz-a-viz its users.

Addressing this challenge, in Interpreting Embedding Models of Knowledge Bases: A Pedagogical Approach  [ICML 2018; code] the authors proposed two models:

  • KGE models with predicted features (XKE-PRED), and
  • KGE models with observed features (XKE-TRUE).

  • XKE-PRED treated the embedding model as a black box and assumed no other source of information for building the interpretable model. By changing the original classifier’s inputs and observing its outputs, the pedagogical approach constructed a training set for an interpretable classifier from which explanations were extracted.

  • XKE-TRUE is a variation of XKE-PRED that assumed an external source of knowledge (regarded as ground truth of their relational domain) besides the embedding model, from which XKE-TRUE extracted interpretable features.

Skipping over additional detail (provided in the paper), the results in Table 3
in that paper are most the interesting with respect to this subsection (Explainable (Interpretable) Models): the application of those models to a Freebase dataset. Five examples were shown (ID #1-5), along with input triples (head-relation-tail), their explanations (reasons: weighted rules and the bias term) with the interpretable classifier’s score (XKE-PRED; XKE-TRUE), and with the labels predicted by the embedding model (trained as a binary classifier: 1 = True; 0 = False). While the results (discussed in Section 4.4) vary – a consequence of an early-stage research effort – the models performed remarkably well in explaining (through the explanations: weighted paths) the selected, embedded triples.


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]

Knowledge-based Transfer Learning Explanation (Jul 2018) [code] likewise sought to boost machine learning applications in decision making by making more human-centric explanations, relevant to transfer learning (utilizing models developed in one domain as the starting point for training in another domain). The authors proposed an ontology-based knowledge representation and reasoning framework for human-centric transfer learning explanation. They first modeled a learning domain in transfer learning [including the dataset and the prediction task, with OWL (Web Ontology Language) ontologies], for which they complemented the prediction task-related common sense knowledge using an individual matching and external knowledge importing algorithm. The framework further used a correlative reasoning algorithm to infer three kinds of explanatory evidence with different granularities (general factors, particular narrators and core contexts) to explain a positive feature or a negative transfer from one learning domain to another.


[Image source. Click image to open in new window.]

Learning Heterogeneous Knowledge Base Embeddings for Explainable Recommendation (Nov 2018) [code] addressed the provision of model-generated explanations in recommender systems in structured KG, using a knowledge base representation learning framework (KBE4ER) to embed heterogeneous entities for recommendation. Based on the embedded knowledge base, a soft matching algorithm was proposed to generate personalized explanations for recommended items. The authors designed a novel explainable collaborative filtering (CF) framework over knowledge graphs [collaborative filtering is also discussed on pp. 10-11 in Implementing Recommendation Algorithms in a Large-Scale Biomedical Science Knowledge Base (Oct 2017)]. The main building block was an integration of traditional CF with the learning of knowledge base embeddings. They first defined the concept of a user-item knowledge graph (illustrated in Fig. 1 in their paper), which encoded knowledge about the user behaviors and item properties as a relational graph structure. The user-item knowledge graph focused on how to depict different types of user behaviors and item properties over heterogeneous entities in a unified framework. Then, they extended the design philosophy of CF to learn over the knowledge graph for personalized recommendation. For each recommended item, they further conducted fuzzy reasoning over the paths in the knowledge graph based on soft matching, to construct personalized explanations.


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]

Some of the methods for constructing KG and knowledge discovery over KG can also provide evidence to support the understanding of new facts. For example:

  • MOLIERE: Automatic Biomedical Hypothesis Generation System (May 2017) [projectcode] is a system that can identify connections within biomedical literature. MOLIERE finds the shortest path between two query keywords in the KG, and extends this path to identify a significant set of related abstracts (which, due to the network construction process, share common topics). Topic modeling, performed on these documents using PLDA+, returns a set of plain text topics representing concepts that likely connect the queried keywords, supporting hypothesis generation (for example, on historical findings MOLIERE showed the implicit link between Venlafaxine and HTR1A, and the involvement of DDX3 on Wnt signaling).

  • Finding Streams in Knowledge Graphs to Support Fact Checking (Aug 2017) viewed a knowledge graph as a “flow network” and knowledge as a fluid, abstract commodity. They showed that computational fact checking of a (subject, predicate, object) triple then amounted to finding a “knowledge stream” that emanated from the subject node and flowed toward the object node through paths connecting them. Evaluations revealed that this network-flow model was very effective in discerning true statements from false ones, outperforming existing algorithms on many test cases. Moreover, the model was expressive in its ability to automatically discover useful path patterns and relevant facts that may help human fact checkers corroborate or refute a claim.

  • Statistical relational learning (SRL) based approaches [reviewed by Nickel et al. (Kevin Murphy; Volker Tresp) in A Review of Relational Machine Learning for Knowledge Graphs (Sep 2015)] can be used in conjunction with machine reading and information extraction methods to automatically build KG. In SRL, the representation of an object may contain its relationships to other objects. The data is in the form of a graph, consisting of nodes (entities) and labelled edges (relationships between entities). The main goals of SRL include prediction of missing edges, prediction of properties of nodes, and clustering nodes based on their connectivity patterns. These tasks arise in many settings such as analysis of social networks and biological pathways.

An excellent follow-on article (different authors) to Nickel et al., A Review of Relational Machine Learning for Knowledge Graphs (Sep 2015) is On Embeddings as an Alternative Paradigm for Relational Learning (Jul 2018), which systematically compared knowledge graph embedding and logic-based SRL methods on standard relational classification and clustering tasks – including discussion of the Path Ranking Algorithm (PRA). Relation paths can be regarded as bodies of weighted rules (more precisely, Horn clauses), where the weight specifies how predictive the body of the rule is for the head.

  • “A strong advantage of KGEs is their scalability, at the expense of their black-box nature and limited reasoning capabilities. SRL methods are a direct opposite – they can capture very complex reasoning, are interpretable but currently of a limited scalability.”

  • The aim of statistical relational learning is to learn statistical models from relational or graph-structured data. Three of the main statistical relational learning paradigms include weighted rule learning, random walks on graphs, and tensor factorization. These methods were mostly developed and studied in isolation, with few attempts at understanding the relationship among them or combining them. For example, in their survey, [A Review of Relational Machine Learning for Knowledge Graphs (Sep 2015)], Nickel et al. described weighted rules and graph random walks as two separate classes of models for learning from relational data.

  • Kazemi and Poole (citing Nickel et al.’s work) noted that relational models based on weighted rule learning could easily be explained to a broad range of people.

The Path Ranking Algorithm (PRA) was introduced by Ni Lao and William Cohen in Relational Retrieval Using a Combination of Path-Constrained Random Walks (2010) [code], and extended by those authors in Random Walk Inference and Learning in A Large Scale Knowledge Base (2011). PRA is an easily-interpretable extension of the idea of using random walks of bounded lengths for predicting links in multi-relational knowledge graphs. Discussed in A Review of Relational Machine Learning for Knowledge Graphs (Sep 2015), the key idea in PRA is to use these path probabilities as features for predicting the probability of missing edges.

  • Interpretability. A useful property of PRA is that its model is easily interpretable. In particular, relation paths can be regarded as bodies of weighted rules - more precisely Horn clauses – where the weight specifies how predictive the body of the rule is for the head. For instance, Table VI shows some relation paths along with their weights that have been learned by PRA … to predict which college a person attended, i.e., to predict triples of the form $\small \text{(p, college, c)}$. The first relation path in Table VI can be interpreted as follows: “it is likely that a person attended a college if the sports team that drafted the person is from the same college.” This can be written in the form of a Horn clause as follows: $\small \text{(p, college, c) $\leftarrow$ (p, draftedBy, t) $\land$ (t, school, c)}$. By using a sparsity promoting prior on $\small w_k$ , we can perform feature selection, which is equivalent to rule learning.”


    [Image source. Click image to open in new window.]

A (Feb) 2018 paper by Kazemi and Poole, Bridging Weighted Rules and Graph Random Walks for Statistical Relational Models, finally addressed the relationship between weighted rules and graph random walks. They studied the relationship between the Path Ranking Algorithm (PRA; one of the best known relational learning methods in the graph random walk paradigm) and Relational Logistic Regression (RLR; one of the recent developments in weighted rule learning). Their result improved the explainability of models learned through graph random walk, by providing a weighted rule interpretation for them.

An Interpretable Reasoning Network for Multi-Relation Question Answering (Jun 2018) addressed multi-relation question answering via elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. Their Interpretable Reasoning Network (IRN) model dynamically decided which part of an input question should be analyzed at each hop, for which the reasoning module predicted a knowledge base relation (relation triple) that corresponded to the current parsed result. More interestingly regarding this subsection, IRN offered traceable and observable intermediate predictions (see their Figure 3
), facilitating reasoning analysis and failure diagnosis (thereby also allowing manual manipulation in answer prediction).

The adoption of machine learning in high-stakes applications such as healthcare requires explanations that are comprehensible to the domain user, who often holds the ultimate responsibility for decisions and outcomes. In 2018 IBM Research Teaching Meaningful Explanations (Sep 2018) proposed an approach [TED : Teaching Explanations for Decisions] to generate such explanations in which training data was augmented to include – in addition to features and labels – explanations elicited from domain users. A joint model then learned to produce both labels and explanations from the input features. This simple idea ensured that explanations were tailored to the complexity expectations and domain knowledge of the user. This new approach was particularly well-suited for explaining a machine learning prediction when all of its input features were inherently incomprehensible to humans, even to deep subject matter experts. Evaluations on a chemical odor, a melanoma and other datasets showed that their approach was generalizable across domains and algorithms, demonstrating that meaningful explanations could be reliably taught to machine learning algorithms (in some cases, also improving modeling accuracy).

“For the present discussion, we define an explanation as information provided in addition to an output that can be used to verify the output. In the ideal case, an explanation should enable a human user to independently determine whether the output is correct. The requirements of meaningful information have two implications for explanations:

  1. Complexity match: the complexity of the explanation needs to match the complexity capability of the user. For example, an explanation in equation form may be appropriate for a statistician, but not for a nontechnical person.

  2. Domain match: an explanation needs to be tailored to the domain, incorporating the relevant terms of the domain. For example, an explanation for a medical diagnosis needs to use terms relevant to the physician (or patient) who will be consuming the prediction.”

Given a neural network, we are interested in knowing what features it has learned for making classification decisions. Network interpretation is also crucial for (computer vision) tasks involving humans, like autonomous driving and medical image analysis (and in the NLP domain, clinical diagnosis and recommendation, …). In an interesting approach, Neural Network Interpretation via Fine Grained Textual Summarization (Sep 2018), the authors introduced the novel task of interpreting classification models using fine grained textual summarization: along with (image classification) label prediction, the network generated a sentence explaining its decision. For example, a knowledgeable person looking at a photograph of a bird might say,

    "I think this is a Anna's Hummingbird because it has a straight bill, a rose pink throat and crown. It's not a Broad-tailed Hummingbird because the later lacks the red crown."

This kind of textual description carries rich semantic information and is easily understandable, illustrating the use of natural language as a logical medium in which to ground the interpretation of deep convolutional models. Tasks that combine text generation and visual explanation include image captioning and visual question answering (VQA). Although this paper addresses those tasks, the method described (that leverages attention, summarization and natural language) could also be applied in the NLP domain as well (this is addressed in Sections 3 and 4.3 in their paper).


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]

Machine learning models appear to be particularly sensitive to adversarial challenges; for example, note Table 1 and the accompanying text in Semantically Equivalent Adversarial Rules for Debugging NLP Models (Ribeiro et al., 2018).


[Image source. Click image to open in new window.]

Those authors also published the heavily-cited LIME algorithm, described in ‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier (Aug 2016) [codediscussion], which supports explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data) or images. LIME  takes a black-box model (such as a neural network, random forest, etc.) and a data sample, and outputs a list of weighted features that contribute most to the classification decision. By going over the same process for all training samples, LIME  can generate a transformed version of the original data set that only contains the relevant features.


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]

A more recent approach from the same authors, largely superseding LIME, is described in Anchors: High-Precision Model-Agnostic Explanations] (Marco Ribeiro et al., 2018) [code here and here], which introduced a novel model-agnostic system that explained the behavior of complex models with high-precision rules called anchors, representing local, sufficient  conditions for predictions. An anchor explanation is a rule that sufficiently anchors  the prediction locally – such that changes to the rest of the feature values of the instance do not matter. … The anchor method is able to explain any black box classifier, with two or more classes – all that is required is that the classifier implements a function that takes in raw text or a numpy array and outputs a prediction (integer). They proposed an algorithm to efficiently compute those explanations for any black-box model with high-probability guarantees.


[Image source. Click image to open in new window.]

An interesting extension to LIME is Induction of Non-Monotonic Logic Programs to Explain Boosted Tree Models Using LIME (Nov 2018) [code]. LIME takes a black-box model (such as a neural network, random forest, etc.) and a data sample, and outputs a list of weighted features that contribute most to the classification decision. By going over the same process for all training samples, we can generate a transformed version of the original data set that only contains the relevant features. The algorithm developed in this paper, UFOLD , learns concise logic programs from a transformed data set that is generated by storing the explanations provided by LIME. Specifically, they presented a new inductive logic programming (ILP) algorithm capable of learning non-monotonic logic programs from local explanations of boosted tree models provided by LIME.


[Image source. Click image to open in new window.]

In what is likely the strongest work in this domain, A Unified Approach to Interpreting Model Predictions (Scott Lundberg et al., Nov 2017) [code] described a unified approach (SHAP: SHapleyA Shapley value is a solution concept in cooperative game theory. It was named in honor of Lloyd Shapley, who introduced it in 1953. To each cooperative game it assigns a unique distribution (among the players) of a total surplus generated by the coalition of all players. The Shapley value is characterized by a collection of desirable properties. Additive exPlanations ) to explain the output of any machine learning model.  SHAP,  which united six existing models including LIME,  assigned an importance value for a particular prediction to each feature, using game theory to guarantee a unique solution that was better aligned to human intuition than existing methods. SHAP is well-discussed in these blog posts, and the project’s very heavily (~4k) starred GitHub repository provides much additional information and examples.

  • [Section 2.4] “Shapley regression values are feature importances for linear models in the presence of multicollinearity. This method requires retraining the model on all feature subsets $\small S \subseteq F$, where $\small F\ $ is the set of all features. It assigns an importance value to each feature that represents the effect on the model prediction of including that feature. … Shapley sampling values are meant to explain any model by: (1) applying sampling approximations to Equation 4, and (2) approximating the effect of removing a variable from the model by integrating over samples from the training dataset. This eliminates the need to retrain the model and allows fewer than $\small 2^{\vert F \vert}$ differences to be computed. …”

  • While the math in this paper appears complex, the (Python) code is rather easily implemented as indicated on the main (README) page on their GitHub repository, and in these two blog posts: here, and here.

  • Note also their follow-on paper, Consistent Individualized Feature Attribution for Tree Ensembles (Feb 2018, updated Mar 2019) [same GitHub repository as A Unified Approach to Interpreting Model Predictions, above].


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]

The SHAP approach was updated in a follow-on paper by the same authors, Consistent Individualized Feature Attribution for Tree Ensembles (Scott Lundberg et al., Feb 2018, updated Mar 2019) [same GitHub repository as A Unified Approach to Interpreting Model Predictions, above].

  • “Interpreting predictions from tree ensemble methods such as gradient boosting machines and random forests is important, yet feature attribution for trees is often heuristics and not individualized for each prediction. Here we show that popular feature attribution methods are inconsistent, meaning they can lower a feature’s assigned importance when the true impact of that feature actually increases. This is a fundamental problem that casts doubt on any comparison between features. To address it we turn to recent applications of game theory and develop fast exact tree solutions for SHAP  (SHapley Additive exPlanation ) values, which are the unique consistent and locally accurate attribution values. We then extend SHAP  values to interaction effects and define SHAP  interaction values. We propose a rich visualization of individualized feature attributions that improves over classic attribution summaries and partial dependence plots, and a unique ‘supervised’ clustering (clustering based on feature attributions). We demonstrate better agreement with human intuition through a user study, exponential improvements in run time, improved clustering performance, and better identification of influential features. An implementation of our algorithm has also been merged into XGBoost  and LightGBM, see this GitHub repository for details.”

Those authors (Scott Lundberg et al.) applied their SHAP model, referred to as Prescience, to the clinical domain in their wonderful paper Explainable Machine-Learning Predictions for the Prevention of Hypoxaemia During Surgery (Oct 2018).

  • Prescience  code. GitHub: shap  (same SHAP repository as above)  |  prescience server  (~4 years old: Aug 2015)  |  prescience interface  (~4 years old: Aug 2015).

    • “We present an ensemble-model-based machine-learning method, Prescience, that predicts the near-term risk of hypoxaemia during anaesthesia care and explains the patient- and surgery-specific factors that led to that risk (Fig. 1). … To support this we have made the explanation tools initially used in Prescience open-source and have continued to improve and extend them (”).
    • “Modelling, processing and web-interface codes specific to Prescience are available for reference purposes at”

  • “Although anaesthesiologists strive to avoid hypoxaemia during surgery, reliably predicting future intraoperative hypoxaemia is not possible at present. Here, we report the development and testing of a machine-learning-based system that predicts the risk of hypoxaemia and provides explanations of the risk factors in real time during general anaesthesia. The system, which was trained on minute-by-minute data from the electronic medical records of over 50,000 surgeries, improved the performance of anaesthesiologists by providing interpretable hypoxaemia risks and contributing factors. The explanations for the predictions are broadly consistent with the literature and with prior knowledge from anaesthesiologists. Our results suggest that if anaesthesiologists currently anticipate 15% of hypoxaemia events, with the assistance of this system they could anticipate 30%, a large portion of which may benefit from early intervention because they are associated with modifiable factors. The system can help improve the clinical understanding of hypoxaemia risk during anaesthesia care by providing general insights into the exact changes in risk induced by certain characteristics of the patient or procedure.”


[Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]


Note here that I excised Fig. 4c -- refer to the paper for the full figure.  [Image source. Click image to open in new window.]


[Image source. Click image to open in new window.]

Global Explanations of Neural Networks: Mapping the Landscape of Predictions (Feb 2019) [code] presented an approach for generating global attributions called GAM, which explained the landscape of neural network predictions across subpopulations. GAM augmented global explanations with the proportion of samples that each attribution best explained and specified which samples were described by each attribution. Global explanations also had tunable granularity to detect more or fewer subpopulations. They demonstrated that GAM’s global explanations yielded the known feature importances of simulated data, matched feature weights of interpretable statistical models on real data, and were intuitive to practitioners through user studies. With more transparent predictions, GAM could help ensure neural network decisions were generated for the right reasons.

  • This paper also discusses LIME.


[Image source. Click image to open in new window.]

GNN Explainer: A Tool for Post-hoc Explanation of Graph Neural Networks (Jure Leskovec and colleagues at Stanford University: Mar 2019) [“All code for GNNExplainer and the baselines, and their parameter settings will be made public.”] proposed GNNExplainer, a general model-agnostic approach for providing interpretable explanations for predictions of any GNN-based model on any graph-based machine learning task (node and graph classification, link prediction). In order to explain a given node’s predicted label, GNNExplainer provides a local interpretation by highlighting relevant features as well as an important subgraph structure by identifying the edges that are most relevant to the prediction. Additionally, the model provides single-instance explanations when given a single prediction as well as multi-instance explanations that aim to explain predictions for an entire class of instances/nodes. GNNExplainer was formalized as an optimization task that maximized the mutual information between the prediction of the full model and the prediction of a simplified explainer model. On synthetic data GNNExplainer was able to highlight relevant topological structures from noisy graphs. GNNExplainer also provided a better understanding of pretrained models on real-world tasks.


[Image source. Click image to open in new window.]