Hands-On Graph Analytics with Neo4j
上QQ阅读APP看书,第一时间看更新

Creating nodes and relationships

Unlike other query languages for graph databases such as Gremlin (https://tinkerpop.apache.org/gremlin.html) or AQL (ArangoDB), Cypher was built to have a syntax similar to SQL, in order to ease the transition for developers and data scientists used to the structured query language.

Like many tools in the Neo4j universe, Cypher's name comes from the movie The Matrix released in 1999: Neo is the main character. Apoc is also a character from this movie.

Managing databases with Neo4j Desktop

It is assumed you already have experience with Neo4j Desktop. This is the easiest tool to manage your Neo4j graphs, the installed plugins, and applications. I recommend creating a new project for this book, in which we are going to create several databases. In the following screenshot, I have created a project named Hands-On-Graph-Analytics-with-Neo4j, containing two databases: Test graph and USA:

Throughout this book, we will use Neo4j Browser, which is an application installed by default in Neo4j Desktop. Within this application, you can write and execute Cypher queries, but also visualize the results in different formats: a visual graph, JSON, or tabular data.

Creating a node

The simplest instruction to create a node is the following:

CREATE ()

It creates a node, without a label or a relationship.

We can recognize the () pattern that is used to identify nodes within Cypher. Every time you want a node, you will have to use brackets.

We can check the content of the database after this statement with a simple MATCH query that will return all nodes of the graph:

MATCH (n)
RETURN n

Here, we are selecting nodes (because of the use of () ), and we are also giving a name, an alias, to these nodes: n. Thanks to this alias, we can refer to those nodes in later parts of the query, here only in the RETURN statement.

The result of this query is shown here:

OK, great. But a single node with no label nor properties is not sufficient for most use cases. If we want to assign a label to a node when creating it, here is the syntax to use:

CREATE (:Label)

That's already better! Now, let's create properties when creating the node:

CREATE (:Label  {property1: "value", property2: 13})

We'll see in later sections how to modify an existing node: adding or removing labels, and adding, updating, or deleting properties.

Selecting nodes

We've already talked about the simple query here, which selects all nodes inside the database:

MATCH (n)
RETURN n

Be careful, if your database is large, this query is likely to make your browser or application crash. It is better to do what you would do with SQL, and add a LIMIT statement:

MATCH (n)
RETURN n
LIMIT 10

Let's try to be more specific in the data we want to select (filter) and the properties we need in the RETURN statement.

Filtering

Usually, we don't want to select all nodes of the database, but only those matching some criteria. For instance, we might want to retrieve only the nodes with a given label. In that case, we'd use this:

MATCH (n:Label)
RETURN n

Or, if you want to select only nodes with a given property, use this:

MATCH (n {id: 1})
RETURN n

The WHERE statement is also useful for filtering nodes. It actually allows more complex comparisons compared to the {} notation. We can, for instance, use inequality comparison (greater than >, lower than <, greater or equal to >=, or lower or equal to <= statements), but also Boolean operations like AND and OR:

MATCH (n:Label)
WHERE n.property > 1 AND n.otherProperty <= 0.8
RETURN n

It might be surprising to you that when selecting nodes, the browser also displays relationships between them when we have not asked it to do so. This comes from a setting in Neo4j Browser, whose default behavior is to enable the visualization of node connections. This can be disabled by un-checking the Connect result nodes setting.

Returning properties

So far, we've returned the whole node, with all properties associated with it. If for your application, you are interested only in some properties of the matched nodes, you can reduce the size of the result set by specifying the properties to return by using the following query:

MATCH (n)
RETURN n.property1, n.property2

With this syntax, we don't have access to the graph output in Neo4j Browser anymore, as it cannot access the node object, but we have a much simpler table output.

Creating a relationship

In order to create a relationship, we have to tell Neo4j about its start and end node, meaning the nodes need to be already in the database when creating the relationship. There are two possible solutions:

  • Create nodes and the relationship(s) between them in one pass:
CREATE (n:Label {id: 1})
CREATE (m:Label {id: 2})
CREATE (n)-[:RELATED_TO]->(m)
  • Create the nodes (if they don't already exist):
CREATE (:Label {id: 3})
CREATE (:Label {id: 4})

And then create the relationship. In that case, since the relationship is created in another query (another namespace), we need to first MATCH the nodes of interest:

MATCH (a {id: 3})
MATCH (b {id: 4})
CREATE (a)-[:RELATED_TO]->(b)

While nodes are identified with brackets, (), relationships are characterized by square brackets, [].

If we check the content of our graph after the first query, here is the result:

Reminder: while specifying a node label when creating a node is not mandatory, relationships must have a type. The following query is invalid: CREATE (n)-[]->(m) and leads to following Neo.ClientError.Statement.SyntaxError:

Exactly one relationship type must be specified for CREATE. Did you forget to prefix your relationship type with a : (line 3, column 11 (offset: 60))?

Selecting relationships

We would like to write queries such as the following one, similar to the one we write for nodes but with square brackets, [], instead of brackets, ():

MATCH [r]
RETURN r

But this query results in an error. Relationships cannot be retrieved in the same way as nodes. If you want to see the relationship properties in a simple way, you can use either of the following syntaxes:

// no filtering
MATCH ()-[r]-()
RETURN r

// filtering on relationship type
MATCH ()-[r:REL_TYPE]-()
RETURN r

// filtering on relationship property and returning a subset of its properties
MATCH ()-[r]-()
WHERE r.property > 10
RETURN r.property

We will see how this works in detail in the Pattern matching and data retrieval section later on.

The MERGE keyword

The Cypher documentation describes the behavior of the MERGE command very well:

MERGE either matches existing nodes and binds them, or it creates new data and binds that. It’s like a combination of MATCH and CREATE that additionally allows you to specify what happens if the data was matched or created.

Let's see an example:

MERGE (n:Label {id: 1})
ON CREATE SET n.timestamp_created = timestamp()
ON MATCH SET n.timestamp_last_update = timestamp()

Here, we are trying to access a node with Label and a single property id, with a value of 1. If such a node already exists in the graph, the subsequent operations will be performed using that node. This statement is then equivalent to a MATCH in that case. However, if the node with label Label and id=1 doesn't exist, then it will be created, hence the parallel with the CREATE statement.

The two other optional statements are also important:

  • ON CREATE SET will be executed if and only if the node was not found in the database and a creation process had to be performed.
  • ON MATCH SET will only be executed if the node already exists in the graph.

In this example, I use those two statements to remember when the node was created and when it was last seen in such a query.

You are now able to create nodes and relationships, assigning label(s) and properties to them. The next section will be dedicated to other kinds of CRUD operations that can be performed on these objects: update and delete.