Contents


Graph Databases [Neo4j]

In graph databases, data often exists as relationships between different objects. While relational databases (RDBMS) store highly structured data, they do not store the relationships between the data. Unlike other databases, graph databases store relationships and connections as first-class entities.

Graph databases excel at managing highly connected data and complex queries. The property graph contains connected entities (the nodes) which can hold any number of attributes (key-value-pairs). Nodes can be tagged with labels representing their different roles in your domain.

Relationships provide directed, named semantically relevant connections between two node-entities. Like nodes, relationships can have any properties. In most cases, relationships have quantitative properties, such as weights, costs, distances, ratings, time intervals, or strengths. As relationships are stored efficiently, two nodes can share any number or type of relationships without sacrificing performance.

The data model for graph databases is simpler compared to other databases and, they can be used with OLTP systems. They provide features like transactional integrity and operational availability.

Graph databases vs. RDBMS
      image source


The Property Graph Model

(Neo4j is a property graph database.)

If you’ve ever worked with an object model or an entity relationship diagram [ERD (RDBMS)], the labeled property graph model will seem familiar. The property graph contains connected entities (nodes) which can hold any number of attributes (key-value-pairs). Nodes can be tagged with labels representing their different roles in your domain. In addition to contextualizing node and relationship properties, labels may also serve to attach metadata—​index or constraint information—​to certain nodes.

Relationships provide directed, named semantically relevant connections between two node-entities. A relationship always has a direction, a type, a start node, and an end node. Like nodes, relationships can have any properties. In most cases, relationships have quantitative properties, such as weights, costs, distances, ratings, time intervals, or strengths. As relationships are stored efficiently, two nodes can share any number or type of relationships without sacrificing performance. Note that although they are directed, relationships can always be navigated regardless of direction.

There is one core consistent rule in a graph database: “No broken links.” Since a relationship always has a start and end node, you can’t delete a node without also deleting its associated relationships. You can also always assume that an existing relationship will never point to a non-existing endpoint.

Property graph model
      click image for full-size
   |   image source


Cypher Query Language

RDBMS like PostgreSQL employ SQL for queries. Neo4J generally focuses on its own proprietary language, Cypher, that has also been embraced by the openCypher project.

SQL vs. Cypher query
      image source

Neo4j provides a very easy-to-read and informative Introduction to Cypher: a declarative, SQL-inspired language for describing patterns in graphs visually using an ASCII-art syntax. Cypher allows us to state what we want to select, insert, update or delete from our graph data without requiring us to describe exactly how to do it.

Cypher query pattern
      image source

As shown above, Cypher uses ASCII-art-like syntax to represent patterns. We surround nodes with parentheses which look like circles, e.g. (node). If we later want to refer to the node, we’ll give it an variable like (p) for person or (t) for thing. In real-world queries, we’ll probably use longer, more expressive variable names like (person) or (thing). If the node is not relevant to your question, you can also use empty parentheses ().

Usually, the relevant labels of the node are provided to distinguish between entities and optimize execution, like (p:Person). We might use a pattern like (person:Person)–>(thing:Thing) so we can refer to them later, for example, to access properties like person.name and thing.quality.

The more general structure is:

MATCH (node:Label) RETURN node.property

MATCH (node1:Label1)-->(node2:Label2)
WHERE node1.propertyA = {value}
RETURN node2.propertyA, node2.propertyB

To access information about a relationship, we can assign it a variable, for later reference. It is placed in front of the colon -[rel:KNOWS]-> , or stands alone -[rel]-> .

General Syntax:

MATCH (n1:Label1)-[rel:TYPE]->(n2:Label2)
WHERE rel.property > {value}
RETURN rel.property, type(rel)

Neo4j Browser:

Neo4j Browser - movies demo
      click image for full-size