To calculate aggregated data, Cypher offers aggregation, much like SQL’s GROUP BY
.
Aggregate functions take multiple input values and calculate an aggregated value from them. Examples are AVG
that
calculate the average of multiple numeric values, or MIN
that finds the smallest numeric value in a set of values.
Aggregation can be done over all the matching sub graphs, or it can be further divided by introducing key values. These are non-aggregate expressions, that are used to group the values going into the aggregate functions.
So, if the return statement looks something like this:
RETURN n, count(*)
We have two return expressions — n
, and count(*)
. The first, n
, is no aggregate function, and so it will be the
grouping key. The latter, count(*)
is an aggregate expression. So the matching subgraphs will be divided into
different buckets, depending on the grouping key. The aggregate function will then run on these buckets, calculating
the aggregate values.
If you want to use aggregations to sort your result set, the aggregation must be included in the RETURN
to be used
in your ORDER BY
.
The last piece of the puzzle is the DISTINCT
keyword. It is used to make all values unique before running them through
an aggregate function.
An example might be helpful:
Query.
START me=node(1) MATCH me-->friend-->friend_of_friend RETURN count(distinct friend_of_friend), count(friend_of_friend)
In this example we are trying to find all our friends of friends, and count them. The first aggregate function, count(distinct friend_of_friend)
, will only see a friend_of_friend
once — DISTINCT
removes the duplicates. The latter aggregate function, count(friend_of_friend)
, might very well see the same friend_of_friend
multiple times. Since there is no real data in this case, an empty result is returned. See the sections below for real data.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} START me=node(1) MATCH me-->friend-->friend_of_friend RETURN count(distinct friend_of_friend), count(friend_of_friend)
The following examples are assuming the example graph structure below.
COUNT
is used to count the number of rows. COUNT
can be used in two forms — COUNT(*)
which just counts the number of matching rows, and COUNT(<identifier>)
, which counts
the number of non-null
values in <identifier>
.
To count the number of nodes, for example the number of nodes connected to one node, you can use count(*)
.
Query.
START n=node(2) MATCH (n)-->(x) RETURN n, count(*)
This returns the start node and the count of related nodes.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2) match (n)-->(x) return n, count(*)
To count the groups of relationship types, return the types and count them with count(*)
.
Query.
START n=node(2) MATCH (n)-[r]->() RETURN type(r), count(*)
The relationship types and their group count is returned by the query.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2) match (n)-[r]->() return type(r), count(*)
Instead of counting the number of results with count(*)
, it might be more expressive to include the name of the identifier you care about.
Query.
START n=node(2) MATCH (n)-->(x) RETURN count(x)
The example query returns the number of connected nodes from the start node.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2) match (n)-->(x) return count(x)
You can count the non-null
values by using count(<identifier>)
.
Query.
START n=node(2,3,4,1) RETURN count(n.property?)
The count of related nodes with the property
property set is returned by the query.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4,1) return count(n.property?)
The SUM
aggregation function simply sums all the numeric values it encounters. Nulls are silently dropped. This is an example of how you can use SUM
.
Query.
START n=node(2,3,4) RETURN sum(n.property)
This returns the sum of all the values in the property property
.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return sum(n.property)
AVG
calculates the average of a numeric column.
Query.
START n=node(2,3,4) RETURN avg(n.property)
The average of all the values in the property property
is returned by the example query.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return avg(n.property)
PERCENTILE_DISC
calculates the percentile of a given value over a group, with a percentile from 0.0 to 1.0. It uses a rounding method, returning the nearest value to the percentile. For interpolated values, see PERCENTILE_CONT.
Query.
START n=node(2,3,4) RETURN percentile_disc(n.property, 0.5)
The 50th percentile of the values in the property property
is returned by the example query. In this case, 0.5 is the median, or 50th percentile.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return percentile_disc(n.property, 0.5)
PERCENTILE_CONT
calculates the percentile of a given value over a group, with a percentile from 0.0 to 1.0. It uses a linear interpolation method, calculating a weighted average between two values, if the desired percentile lies between them. For nearest values using a rounding method, see PERCENTILE_DISC.
Query.
START n=node(2,3,4) RETURN percentile_cont(n.property, 0.4)
The 40th percentile of the values in the property property
is returned by the example query, calculated with a weighted average.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return percentile_cont(n.property, 0.4)
MAX
find the largets value in a numeric column.
Query.
START n=node(2,3,4) RETURN max(n.property)
The largest of all the values in the property property
is returned.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return max(n.property)
MIN
takes a numeric property as input, and returns the smallest value in that column.
Query.
START n=node(2,3,4) RETURN min(n.property)
This returns the smallest of all the values in the property property
.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4) return min(n.property)
COLLECT
collects all the values into a list. It will ignore null values,
Query.
START n=node(2,3,4,1) RETURN collect(n.property?)
Returns a single row, with all the values collected.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start n=node(2,3,4,1) return collect(n.property?)
All aggregation functions also take the DISTINCT
modifier, which removes duplicates from the values.
So, to count the number of unique eye colors from nodes related to a
, this query can be used:
Query.
START a=node(2) MATCH a-->b RETURN count(distinct b.eyes)
Returns the number of eye colors.
Try this query live. (1) {"eyes":"brown","name":"D"} (2) {"name":"A","property":13} (3) {"eyes":"blue","name":"B","property":33} (4) {"eyes":"blue","name":"C","property":44} (2)-[:KNOWS]->(3) {} (2)-[:KNOWS]->(4) {} (2)-[:KNOWS]->(1) {} start a=node(2) match a-->b return count(distinct b.eyes)
Copyright © 2013 Neo Technology