What is an ontology?

Source Persagen.com
Author Dr. Victoria A. Stuart, Ph.D.
Created 2021-04-16
Last modified
Summary Glossary of key terms for Persagen.com
Contents

Background

An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence.

A named entity is a real-world object - such as a person, location, organization, product, etc. - that can be denoted with a proper name. It can be abstract or have a physical existence. Examples of named entities include Barack Obama, New York City, Volkswagen Golf, or anything else that can be named. Named entities can simply be viewed as entity instances (e.g., New York City is an instance of a city).

According to Wikipedia, "ontology" is the branch of philosophy that studies concepts such as existence, being, becoming, and reality. It includes the questions of how entities are grouped into basic categories and which of these entities exist on the most fundamental level. Ontology is sometimes referred to as the science of being and belongs to the major branch of philosophy known as metaphysics.

In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

Eureka! - the Persagen ontology

Eureka, from the Ancient Greek εὕρηκα - "I have found (it)" - is an interjection used to celebrate a discovery or invention.

The Persagen ontology, Eureka! is used to classify entities as a grounded hierarchical data structure. "Grounded" means that all entries stem from a common root (ROOT), extended through the nested classification to the LEAF nodes - an entity (idea or concept; thing; named entity). This data structure provides several key attributes.

  • Anything in the universe may be easily and definitively categorized.

  • Being grounded, the relationship among entities is readily apparent.

  • Named entities may be uniquely identified, thus disambiguated both to humans and machines (machine learning, especially natural language processing.

  • A greater understanding of a domain may be attained through the examination of similar (locally categorized) entities in the ontology. Ontologies facilitate the discovery and visualization of relationships, not previously recognized or understood.

  • While some entities may appear as two or more ontological classifications, cross-referencing among ontology entries again facilitates broader understanding of subject matter.

  • Here is an example of a grounded ontological structure (illustration only - see also the D3.js visualization (demo), following).

    Eureka! - Data structure

    Eureka! is a living document - continually updated and refined.

    I have devoted considerable time and resources to the development, curation and maintenance of Eureka!. At the outset, I maintained Eureka! as a flat Vim file, which allows facile editing and sorting.

    For example, here is a representative listing (entries here edited for brevity).

    Eureka! currently (2022-07) consists of approximately 17,330 lines (entries). While it is easy to manage Eureka! in Vim (text editor) as a flat file, it is not the ideal data structure for these data. The ideal web- and JavaScript-friendly data structure is JSON (possibly JSONB, in PostgreSQL) - certainly JSON, in some form. JSON also allows the facile embedding of metadata - also extensively used at Persagen.com for data annotation and information retrieval and processing. JSON also facilitates the incorporation of relationships (e.g. parent-child nodes, and the representation of hyperdimensional data (analogous to mathematical tensors, the basis of Google's TensorFlow).

    The downside is it's considerably more difficult to manually edit and interact with JSON. Resolving this technical challenge is a key focus of Persagen's data engineering. In the meantime, the indexing of Eureka! in Apache Solr provides a facile user interface to the querying of those data.

    Eureka! - Data visualization

    An earlier draft version of Persagen explored a D3.js visualization of Eureka!. Noting the challenges above and the need to press forward on other areas of development (Persagen is a solo effort), JavaScript (JSON)-based visualizations of Eureka! await further study and exploration.

    [demo] Ontology D3.js visualization with search, pan, zoom

       

    Click to drag; mouse wheel (or double click / Shift-double click) to zoom at cursor position. Page Up/Down to scroll webpage. Reload page to reset all selections, views.

    Alternatives to the D3.js visualization above include the graphical display of those data - i.e., a relation graph (nodes plus edges). Platforms under consideration for that approach include NetworkX, Cytoscape (possibly Cytoscape.js), tensors / TensorFlow TensorBoard, and custom solutions.