- B.Sc., Honours Biochemistry (Polynucleotide Chemistry), 1983, Dalhousie University
- M.Sc., Occupational Hygiene (Genotoxicity), 1995, University of British Columbia
- Ph.D., Biology (Molecular Genetics), 2000, University of Victoria
- P.D.F., Molecular Genetics, 2001-2008, National Institute of Environmental Health Sciences
I have a lifelong interest in molecular genetics, with a focus on functional genomics: the phenotypic expression of the information encoded in our genomes.
Central to those interests is how information is encoded, retrieved and utilized.
My interests span biology, genetics, genomics, pathways, networks and bioinformatics.
Methods: 1980 - 2008
Basic, benchside research
- polynucleotide chemistry
- microbial genetics | transgenic rodents
- early bioinformatic (expression profiles; interactomes)
Methods: 2008 - present
Computational analyses: when I returned to Vancouver from NIEHS in 2008 I shifted my focus to bioinformatics, to better understand and leverage bio-encoded information.
While not doing my consulting work (for the U.S. DoD), I began acquiring the skills and methods to address my overarching Vision – leveraging biomedical data for human health, and knowledge discovery.
The following material briefly summarizes my approach and rationale.
- linux scripting | R | Python
- ML: machine learning
- NLP: natural language processing
- information storage
- information retrieval | extraction
- data analyses | clustering | visualization
One of my longtime Aims is building a high-quality knowledge base.
As an example, consider issues surrounding the retrieval of PubMed articles and other biomedical data (metabolome; cellular signaling; …).
- information retrieval, preprocessing: bash scripts
- RDBMS: PostgreSQL, PSQL
- information extraction:
- Python: TF-IDF; RAKE; TextRank
- NLP: NER (named entity recognition); disambiguation
- NLP: information / relation extraction (dependency / syntactic parsing; NP: Noun Phrase chunking)
Coincident with my focus on NLP-based methods (late 2015) were breakthrough advances in ML including ML-based NLP.
The state of the art in NLP at that time was cumbersome, domain-specific, rules-based approaches.
Accordingly, I spent >1 year immersing myself in ML background, theory and hands-on programming: see my Personal Projects page for some examples.
Satisfied with that inaugural training, late in 2017 I shifted my focus back to my overall strategy.
Since that time, I have largely focused on the following activities:
- information retrieval / storage: PostgreSQL, PSQL
- graphical models: Neo4J, Cypher
I utilize Postgres as my primary data store. In a stepwise process, I have added all human genomic and metabolome data (NCBI and HMDB, respectively).
My next step is to retrieve a HQ subset of Pubmed / PMC, index those data, and extract relationships for incorporation into my knowledge graph.
Knowledge graphs complement Postgres for their utility in visualizing complex datasets, and rapid complex queries.
As proof-of-concept, I have begun the construction of some basic metabolic pathways in Neo4j, with the intention to also add cellular signaling pathways (and other data) relevant to human disease.
As suggested in my Vision, there are multiple numerous and applications of this work:
- topic modeling
- summation | attention-based information retrieval
- active / directed learning | Question-Answering
- automatic text understanding
- cognitive computing
- statistical inference (Bayesian …)
- personal agents / assistants
- advanced clustering methods: VSM; knowledge graph traversals; …
- advanced user interfaces | visualizations
- dynamic network models | in silico modeling; e.g.:
- temporal fluctuations in metabolite concentrations (health; disease)
- effect on pathways due to defects in catalysis or signaling
- effects on pathways due to genetic / epigenetic variations
- identification of therapeutic targets
- personalized, precise medicine
- preventative medicine
Integral to many of those aims is the application of ML; e.g.:
- CNN | LSTM | bi-LSTM | VSM applied to NLP
- generative adversarial models applied to metabolic modeling
- bi-LSTM, attentional methods applied to summation
- adversarial training | recurrent CNN | RNN | bi-LSTM … for relation extraction, classification
- bAbI, other approaches to automatic text understanding
- analyses of KG to predict previously unknown treatment and causative relations between biomedical entities
- in silico network modeling + ML (one informs the other)
- note (e.g.) that the Allen Institute | Google DeepMind | others are hugely invested in the application of ML to better understanding the brain, cognition, dynamic memory network modeling …