Victoria Stuart, Ph.D.:
- Research: 1980 - 2008
- Research: 2008 - Present
- Research Areas and Approaches
- B.Sc., Honours Biochemistry (Polynucleotide Chemistry), 1983, Dalhousie University
- M.Sc., Occupational Hygiene (Genotoxicity), 1995, University of British Columbia
- Ph.D., Biology (Molecular Genetics), 2000, University of Victoria
- Postdoctoral (Molecular Genetics), 2001-2008, National Institute of Environmental Health Sciences
My knowledge and experience spans
- biochemistry; cell biology; metabolic pathways; cellular signaling pathways and networks
- molecular genetics and genomics, including cancer biology
with more recent programming experience including
- Linux super-user
- relational databases (PostgreSQL)
- graphical knowledge stores (knowledge graphs)
- natural language processing / understanding
- machine learning.
[See my curriculum vitae for additional detail.]
I have a lifelong interest in molecular genetics, with a focus on functional genomics: the phenotypic expression of the information encoded in our genomes.
Central to these interests is how information is encoded, retrieved and utilized.
Research: 1980 - 2008
Biochemistry; Biology; Molecular Genetics
- polynucleotide chemistry
- site-directed mutagenesis
- DNA repair pathways
- microbial genetics | transgenic rodent models
- spontaneous mutations (ageing)
- dietary mutagens and carcinogens
- DNA damage and repair
- mitochondrial genetics
- gene expression profiles
- DNA, protein interactomes)
Research: 2008 - Present
One of my longtime Aims is building a high-quality relational knowledge store using information extracted from PubMed and other biomedical data (metabolome; metabolic networks and pathways; cellular signaling networks; …) for use in recommendation, summarization, question answering, and biomedical knowledge discovery.
My research at NIEHS (Durham, N.C.) involved genetic and bioinformatic analyses of DNA damage, repair and metabolism in yeast. Upon my return to Vancouver I continued to focus on bioinformatic approaches to leveraging molecular genomics data for a better understanding of metabolism, molecular genetics, functional genomics and clinical science.
To better address this Vision, I acquired expertise in Python (a powerful general purpose programming language), relational databases, textual knowledge stores, web programming, natural language programming (NLP), and graphical models.
Coincident with my focus on NLP-based methods (late 2015) were breakthrough advances in machine learning (ML) – particularly in the computer vision domain, including the use of pretrained models and transfer learning. This period also saw stunning advances in other ML domains, including NLP, reinforcement learning, deep neural network architectures, generative adversarial models, ML platforms, etc.
Consequently, for a period of ~1.5+ years (2015-2017) I fully immersed myself in the machine learning domain including the theoretical background, installing and debugging major ML platforms (Theano; Caffe; Torch7; TensorFlow; …), etc. During that period I also implemented various personal, self-taught ML projects.
The emergence of pretrained language models in 2018 provided stunning advances and unparalleled opportunities in NLP and language understanding – including, for example, the processing of out-of-vocabulary words, domain adaptation (transfer and multitask learning), and syntactic analyses. These language models greatly facilitated ML-based advances in NLP, supplanting traditional NLP approaches which tend to be cumbersome, domain-specific, rules-based approaches.
2018 also saw significant advances in graph-based machine learning (embeddings, representations, attention, convolutions; knowledge graph completion; etc.) that facilitate large-scale link prediction (relation extraction) and latent knowledge discovery. Likewise, recent advances in graph signal processing can be used to infer global properties based on sampling, noise reduction, etc.
My next steps include retrieving a high-quality subset of Pubmed and PubMed Central and extracting relationships from those documents for incorporation into my knowledge graph. Knowledge graphs complement Solr and Postgres for their utility in visualizing complex datasets, establishing and mining relationships, and rapid, complex queries.
- addressing information overload through classification, attentional models, and summarization; and
- question answering and recommendation.
I have also begun the construction of some basic metabolic pathways in a graphical model, with the intention of adding additional cellular signaling pathways (and other data) relevant to human disease.
- network and pathways analyses;
- grounding of metabolic and cellular signaling pathways to external knowledge sources; and
- in silico modeling (e.g., modeling the effects of genomic variants or dysregulated pathways and networks, and modeling therapeutic interventions).
Another area of interest is the creation and leveraging of multi-view and hyperbolic embeddings in knowledge graphs, enabling the encoding of various signals within the same graph. Graph signal processing methods are especially relevant for this task.
- comparing metabolic/signaling networks in healthy/diseased patients; and
- temporal views of metabolism and metabolites, to list a couple of examples.
In all cases, advanced NLP and ML methods will be used (as appropriate) to further explore textual and graphical knowledge stores for knowledge discovery.
Research Areas and Approaches
|Research Areas||Programmatic Approaches|
|Information retrieval, preprocessing||
Linux scripts [example-1; example-2];
Natural language processing:
Named entity recognition:
Semantic role labeling:
Word sense disambiguation;
|Textual knowledge store||Apache Solr|
|Relational knowledge store||
RDBMS: PostgreSQL (PSQL);
Comma separated files
|Graphical knowledge store
PostgreSQL (PSQL) [example];
Comma separated files;
Statistical relational learning;
Knowledge graph embedding
Document, text classification;
|Textual information extraction||
Python: TF-IDF | RAKE | TextRank;
NLP: named entity recognition | word sense disambiguation;
NLP: information, relation extraction (dependency / syntactic parsing; noun phrase chunking);
Natural language processing;
|Graphical information extraction||
Knowledge graph completion (link prediction);
Graph signal processing
|Information retrieval | extraction||
Natural language processing;
|Natural language understanding||
Memory based architectures;
Question answering & Reading comprehension;
Natural language inference
|In silico modeling||
Network, pathways analyses
Backend: Apache Solr;
As suggested in my Vision, there are multiple numerous and applications of this work – including but not limited to:
- topic modeling
- summation | attention-based information retrieval
- active / directed learning | question answering
- automatic text understanding
- cognitive computing
- statistical inference (Bayesian …)
- personal agents / assistants
- advanced clustering methods: vector space models (VSM); knowledge graph traversals; …
- advanced user interfaces | visualizations
- dynamic network models | in silico modeling; e.g.:
- temporal fluctuations in metabolite concentrations (health; disease)
- effect on pathways due to defects in catalysis or signaling
- effects on pathways due to genetic / epigenetic variations
- identification of therapeutic targets
- personalized, precise medicine
- preventative medicine
Integral to many of these aims is the application of NLP, ML and graphical models; e.g.:
- CNN | LSTM | Bi-LSTM
- attentional mechanisms; memory mechanisms
- word embedding, VSM; language models
- generative adversarial models applied to metabolic modeling
- text, natural language understanding
- analyses of KG to predict previously unknown treatment and causative relations between biomedical entities
- in silico network modeling + ML (one informs the other)
- note (e.g.) that the Allen Institute | Google DeepMind | others are hugely invested in the application of ML to better understanding the brain, cognition, dynamic memory network modeling …