You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: documentation/pole.adoc
+78-11
Original file line number
Diff line number
Diff line change
@@ -64,7 +64,9 @@ __All scenarios and persons portrayed in this demo are fictitious. Any similari
64
64
Review the metagraph, and see the types of nodes and relationships we're going to be working with.
65
65
66
66
[source,cypher]
67
+
----
67
68
call db.schema.visualization()
69
+
----
68
70
69
71
Notice the different ways that Persons can be related to each other. There is a general 'KNOWS' relationship, as well as more specific relationship types: FAMILY_REL (related to), KNOWS_LW (lives with), KNOWS_PHONE (has a related phone call), and KNOWS_SN (social network).
70
72
@@ -75,9 +77,11 @@ Notice also that Location is associated to both Postcode and Area. In the UK, P
75
77
76
78
Let's have a look at the types of crimes in the graph, and the number of times each occurred:
77
79
[source,cypher]
80
+
----
78
81
MATCH (c:Crime)
79
82
RETURN c.type AS crime_type, count(c) AS total
80
83
ORDER BY count(c) DESC
84
+
----
81
85
82
86
You should see that 'Violence and sexual offences' was the highest category of crimes for the month, with weapons offences being the category with the lowest count.
83
87
@@ -87,18 +91,22 @@ You should see that 'Violence and sexual offences' was the highest category of c
87
91
Let's also look at the top locations in the graph where crimes have been recorded:
88
92
89
93
[source,cypher]
94
+
----
90
95
MATCH (l:Location)<-[:OCCURRED_AT]-(:Crime)
91
96
RETURN l.address AS address, l.postcode AS postcode, count(l) AS total
92
97
ORDER BY count(l) DESC
93
98
LIMIT 15
99
+
----
94
100
95
101
You should see several obvious public places and institutions with high numbers of crime associated - Piccadilly (the area near the main rail station in Manchester), a Shopping Area (and a nearby Prison), etc. There are some residential-looking addresses towards the bottom of the list with pretty high numbers (i.e. 35 crimes at both 182 Waterson Avenue and 43 Walker's Croft).
96
102
97
103
== General Queries
98
104
=== Crimes near a particular address
105
+
99
106
The popular UK television drama 'Coronation Street' is set in a fictional Manchester-area neighbourhood. There's a Coronation Street address in the graph (1 Coronation Street, home of the Barlow family in the show). Using the longitude and latitude properties in our Location nodes we can do a distance-based search to find crimes that are within 500 metres of this address.
100
107
101
108
[source,cypher]
109
+
----
102
110
MATCH (l:Location {address: '1 Coronation Street', postcode: 'M5 3RW'})
103
111
WITH point(l) AS corrie
104
112
MATCH (x:Location)-[:HAS_POSTCODE]->(p:PostCode),
@@ -108,22 +116,31 @@ WHERE distance < 500
108
116
RETURN x.address AS address, p.code AS postcode, count(c) AS crime_total, collect(distinct(c.type)) AS crime_type, distance
109
117
ORDER BY distance
110
118
LIMIT 10
119
+
----
111
120
112
121
== General Queries
113
122
=== Crimes investigated by Inspector Morse
123
+
114
124
Another popular UK television drama is 'Inspector Morse'. There's also an Inspector Morse in our graph - let's see what Crimes he is investigating.
125
+
115
126
[source,cypher]
127
+
----
116
128
MATCH (o:Officer {rank: 'Chief Inspector', surname: 'Morse'})<-[i:INVESTIGATED_BY]-(c:Crime)
117
129
RETURN *
130
+
----
118
131
119
132
You should see quite a number of Crime nodes connected by the INVESTIGATED_BY relationship to the Inspector Morse node. Take a few minutes to click on some of them to expand the graph and see what other nodes are related to some of these Crimes.
120
133
121
134
== Crime Investigation
122
135
=== Crimes under investigation by Officer Larive
136
+
123
137
Let's say we are interested in the crimes that are under investigation by Police Constable Devy Larive (Badge Number 26-5234182).
138
+
124
139
[source,cypher]
140
+
----
125
141
MATCH (c:Crime {last_outcome: 'Under investigation'})-[i:INVESTIGATED_BY]->(o:Officer {badge_no: '26-5234182', surname: 'Larive'})
126
142
return *
143
+
----
127
144
128
145
We can see Police Constable Larive is investigating a number of crimes at the moment. In particular we can see that PC Larive is investigating three Drugs Crimes. Double clicking on these three Drugs crimes shows us:
129
146
@@ -135,9 +152,11 @@ We could click on these nodes and manually explore the graph to get more informa
135
152
136
153
== Crime Investigation
137
154
=== Shortest path between persons related to crimes
155
+
138
156
Let's see if the two Persons - Jack Powell and Raymond Walker - associated with these three Drugs Crimes are somehow connected in the graph. We'll look for all of the shortest paths between them of 3 or fewer hops along all types of 'KNOWS' relationships. We can ignore the direction of the relationships in this query, as we're not interested in which direction they point.
139
157
140
158
[source,cypher]
159
+
----
141
160
MATCH (c:Crime {last_outcome: 'Under investigation', type: 'Drugs'})-[:INVESTIGATED_BY]->(:Officer {badge_no: '26-5234182'}),
142
161
(c)<-[:PARTY_TO]-(p:Person)
143
162
WITH COLLECT(p) AS persons
@@ -146,16 +165,20 @@ UNWIND persons AS p2
146
165
WITH * WHERE id(p1) < id(p2)
147
166
MATCH path = allshortestpaths((p1)-[:KNOWS|KNOWS_LW|KNOWS_SN|FAMILY_REL|KNOWS_PHONE*..3]-(p2))
148
167
RETURN path
168
+
----
149
169
150
170
It turns out they are part of what looks like a social group. Two of Raymond's family relations (his father Phillip and sister Kathleen) know Alan Ward, who is the brother of Jack Powell. Raymond's father Phillip also lives with Jack's father Brian. Knowing that Raymond is under investigation for production of cannabis, that Jack is under investigation for two separate charges of possession of cannabis with intent to supply, and that they seem to be part of a social group we can speculate it's possible that they know each other and that Jack is getting his cannabis from Raymond.
151
171
152
172
== Crime Investigation
153
173
=== Other related people associated with drugs crimes
174
+
154
175
To build an even stronger case let's look at the social networks of Jack Powell and Raymond Walker, and see if anyone else within 3 hops of them along 'KNOWS' relationships is also related to a Drugs Crime.
155
176
156
177
[source,cypher]
178
+
----
157
179
MATCH path = (:Officer {badge_no: '26-5234182'})<-[:INVESTIGATED_BY]-(:Crime {type: 'Drugs'})<-[:PARTY_TO]-(:Person)-[:KNOWS*..3]-(:Person)-[:PARTY_TO]->(:Crime {type: 'Drugs'})
158
180
RETURN path
181
+
----
159
182
160
183
This query reveals an interesting and somewhat dense social network, including family relations and people who live with one another. Reviewing the graph we can see:
161
184
@@ -179,36 +202,45 @@ We might also be able to infer some additional relationships in this graph:
179
202
Now we can explore a series of queries to simulate research on 'vulnerable' or 'at risk' individuals in the graph. This might be especially important in a social services or child protection use case. Here we have defined 'vulnerable person' as someone who is not themselves associated to a crime, but who knows many people who are. Run the query below to generate a list of the Top 5 most vulnerable people in the graph.
180
203
181
204
=== Top 5 vulnerable people in the graph
205
+
182
206
[source,cypher]
207
+
----
183
208
MATCH (p:Person)-[:KNOWS]-(friend)-[:PARTY_TO]->(:Crime)
184
209
WHERE NOT (p:Person)-[:PARTY_TO]->(:Crime)
185
210
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, count(distinct friend) AS dangerousFriends
186
211
ORDER BY dangerousFriends DESC
187
212
LIMIT 5
213
+
----
188
214
189
215
We will be referring to this list of Vulnerable people throughout the next few steps, so you may want to keep the results handy (try using the tack icon to pin them to the top).
190
216
191
217
== Vulnerable Persons Investigation
192
218
=== Friends of Friends
219
+
193
220
Using Cypher it's then very easy to explore the graph out through a wider social circle. A small change to the query allows us to see not only friends of individuals who are associated with crimes, but also 'friends of friends' who are associated with crimes as well.
194
221
195
222
[source,cypher]
223
+
----
196
224
MATCH (p:Person)-[:KNOWS*1..2]-(friend)-[:PARTY_TO]->(:Crime)
197
225
WHERE NOT (p:Person)-[:PARTY_TO]->(:Crime)
198
226
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, count(distinct friend) AS dangerousFriends
199
227
ORDER BY dangerousFriends DESC
200
228
LIMIT 5
229
+
----
201
230
202
231
Try modifying the query to look at 'friends of friends of friends' (3 'KNOWS' relationships out) and see how that changes the results.
203
232
204
233
205
234
== Vulnerable Persons Investigation
206
235
=== Exploring a Vulnerable Person's graph
236
+
207
237
Let's explore the graph for the top result from our original Vulnerable Persons results (which, hopefully, you've pinned in a previous step).
208
238
209
239
[source,cypher]
240
+
----
210
241
MATCH path = (:Location)<-[:CURRENT_ADDRESS]-(:Person {nhs_no: '804-54-6976', surname: 'Freeman'})-[:KNOWS]-(:Person)-[:PARTY_TO]->(:Crime)
211
242
RETURN path
243
+
----
212
244
213
245
We can see that Anne Freeman has 8 dangerous friends. Using her ID, this query shows us the graph of these friends, which we can navigate and explore.
214
246
@@ -221,19 +253,23 @@ You can also try updating this query to show 'friends of friends' or 'friends of
221
253
Now that we've seen Anne Freeman's social circle, it would be good to know whether any of her dangerous friends is actually local to her (in her area, or neighbourhood).
222
254
223
255
[source,cypher]
256
+
----
224
257
MATCH (anne:Person {nhs_no: '804-54-6976', surname: 'Freeman'})-[k:KNOWS]-(friend)-[pt:PARTY_TO]->(c:Crime),
We can see it's only her friend Craig, who she knows through social networks, that lives in the same Area (SK1) as Anne. Craig has been associated with two Public Order offences.
231
265
232
266
== Vulnerable Persons Investigation
233
267
=== Looking for connections between Vulnerable Persons
268
+
234
269
Going back to the list of vulnerable people, let's see if any of them are connected. This query takes the results of the vulnerable people query and looks for paths of 'KNOWS' relationships that connect them.
235
270
236
271
[source,cypher]
272
+
----
237
273
MATCH (p:Person)-[:KNOWS]-(friend)-[:PARTY_TO]->(:Crime)
238
274
WHERE NOT (p:Person)-[:PARTY_TO]->(:Crime)
239
275
WITH p, count(distinct friend) AS dangerousFriends
@@ -245,49 +281,58 @@ UNWIND people AS p2
245
281
WITH * WHERE id(p1) <> id (p2)
246
282
MATCH path = shortestpath((p1)-[:KNOWS*]-(p2))
247
283
RETURN path
284
+
----
248
285
249
286
It turns out there are connections between them, of different lengths. There are actually multiple paths by which some of them are connected.
250
287
251
288
We're finished now with the original list of vulnerable people and those results can be closed or unpinned.
252
289
253
290
== Vulnerable Persons Investigation
254
291
=== Looking for Dangerous Family Friends
292
+
255
293
We can now write another query looking for vulnerable or at risk individuals, but this time based on their family relationships rather than their direct social relationships. We'll look for people who are not directly related to a crime, and neither is their relative, but their relative has dangerous friends.
256
294
257
295
[source,cypher]
296
+
----
258
297
MATCH (p:Person)-[:FAMILY_REL]-(relative)-[:KNOWS]-(famFriend)-[:PARTY_TO]->(:Crime)
259
298
WHERE NOT (p:Person)-[:PARTY_TO]->(:Crime) AND
260
299
NOT (relative)-[:PARTY_TO]->(:Crime)
261
300
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, count(DISTINCT famFriend) AS DangerousFamilyFriends
262
301
ORDER BY DangerousFamilyFriends DESC
263
302
LIMIT 5
303
+
----
264
304
265
305
You should see 5 people who have family members with dangerous friends.
266
306
267
307
== Vulnerable Persons Investigation
268
308
=== Looking for Dangerous Family Friends
309
+
269
310
The previous query returned a good set of at risk individuals. However, it's probably not specific enough - it would be more interesting to see this list with an additional requirement that the vulnerable individuals live with their relative who has dangerous friends.
270
311
271
312
272
313
[source,cypher]
314
+
----
273
315
MATCH (p:Person)-[:FAMILY_REL]-(relative)-[:KNOWS]-(famFriend)-[:PARTY_TO]->(:Crime),
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, count(DISTINCT famFriend) AS DangerousFamilyFriends
278
320
ORDER BY DangerousFamilyFriends DESC
279
321
LIMIT 5
322
+
----
280
323
281
324
This version of the query returns only 2 people, but the one with the highest number of dangerous family friends (Kimberly Alexander) is the same as from the results of the previous query.
282
325
283
326
== Vulnerable Persons Investigation
284
327
=== Exploring a Vulnerable Person's graph
328
+
285
329
We can view Kimberley's graph, and see that Kimberly (age 12) lives with her mother Bonnie at 53 Ridge Grove. Bonnie has several friends who are related to a number of crimes of varying types. There's a high chance that Kimberly is being exposed to these people, potentially putting her at risk.
286
330
287
331
[source,cypher]
332
+
----
288
333
MATCH path = (relative:Person)-[:CURRENT_ADDRESS]->(:Location)<-[:CURRENT_ADDRESS]-(:Person {nhs_no: '548-59-5017', surname: 'Alexander'})-[:FAMILY_REL]-(relative)-[:KNOWS]-(:Person)-[:PARTY_TO]->(:Crime)
289
334
RETURN path
290
-
335
+
----
291
336
292
337
== Graph Algorithms
293
338
=== Triangle Count
@@ -299,74 +344,96 @@ The triangle count algorithm returns 'triangles' of connected nodes - in this ca
299
344
Run the following query to identify Person nodes in our graph who are members of the highest number of triangles.
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, triangles
308
356
ORDER BY triangles DESC
309
357
LIMIT 10;
358
+
----
310
359
311
360
== Algorithms
312
361
=== Triangle Count
362
+
313
363
We can take a look at the graph for one of the sets of triangles that was returned - Deborah Ford, who belongs to ten triangles.
314
364
315
365
We can see that Patricia Carr knows both Deborah Ford and Jonathan Hunt, and both Deborah and Jonathan know Peter Bryant, Harry Lopez, and Phillip Perry. We can might therefore infer that Patricia knows Peter, Harry, and Phillip as well.
316
366
317
367
[source,cypher]
368
+
----
318
369
MATCH path = (p1:Person {nhs_no: '838-45-9343', surname: 'Ford'})-[:KNOWS]-(p2)-[:KNOWS]-(p3)-[:KNOWS]-(p1)
319
370
RETURN path
371
+
----
320
372
321
373
322
374
== Algorithms
323
375
=== Triangle Count on a Subgraph
376
+
324
377
The previous query was interesting, but we ran it against the entire graph. We can use the same algorithm on a sub-graph - for instance, only people who associated with crimes. This returns a different set of triangles, consisting only of people associated with crimes who appear in communities/clusters.
325
378
326
379
[source,cypher]
327
-
CALL algo.triangleCount.stream('MATCH (p:Person)-[:PARTY_TO]->(c:Crime) RETURN id(p) AS id', 'MATCH (p1:Person)-[:KNOWS]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target', {concurrency:4, graph:'cypher'})
328
-
YIELD nodeId, triangles
380
+
----
381
+
CALL gds.triangleCount.stream(
382
+
{nodeQuery:'MATCH (p:Person) WHERE exists { (n)-[:PARTY_TO]->(:Crime) } RETURN id(p) AS id',
383
+
relationshipQuery:'MATCH (p1:Person)-[:KNOWS]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target'})
384
+
YIELD nodeId, triangleCount as triangles
329
385
MATCH (p:Person)
330
386
WHERE ID(p) = nodeId AND
331
387
triangles > 0
332
388
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, triangles
333
389
ORDER BY triangles DESC
334
390
LIMIT 5;
335
-
391
+
----
336
392
337
393
== Algorithms
338
394
=== Triangle Count on a Subgraph
395
+
339
396
Looking at the triangles associated to one of the top results from the previous query (Phillip Williamson) shows an interesting group of people who know each other, are related to each other, and/or live with each other. The names look familiar from our previous Drugs investigation - we have quite a group of potential criminals here. In addition to the Drugs Crimes there are a lot of Vehicle Crimes associated with this social group. Perhaps this is a gang which specialises in car theft. It's interesting to note how the algorithms automatically turned up something we needed to specifically search for earlier (during our Drugs search we had specific Officer and Person starting nodes from our search).
340
397
341
398
[source,cypher]
399
+
----
342
400
MATCH (p1:Person {nhs_no: '337-28-4424', surname: 'Williamson'})-[k1:KNOWS]-(p2)-[k2:KNOWS]-(p3)-[k3:KNOWS]-(p1)
343
401
WITH *
344
402
MATCH (person)-[pt:PARTY_TO]->(crime) WHERE person IN[p1, p2, p3]
345
403
RETURN *
404
+
----
346
405
347
406
== Algorithms
348
407
=== Betweenness Centrality
408
+
349
409
The betweenness algorithm measures centrality in the graph - a way of identifying the most important nodes in a graph. It does this by identifying nodes which sit on the shortest path between many other nodes and scoring them more highly. We can see the people here which are potentially important in the graph by using this measure - they sit on the shortest path between the most other people via the 'KNOWS' relationship (ignoring relationships direction, as it's not very important here). Information and resources tend to flow along the shortest paths in a graph, so this is one good way of identifying central nodes or 'bridge' nodes between communities in the graph.
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, toInt(centrality) AS score
419
+
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, toInteger(centrality) AS score
357
420
ORDER BY centrality DESC
358
421
LIMIT 10;
359
-
422
+
----
423
+
360
424
== Algorithms
361
425
=== Betweenness Centrality
426
+
362
427
We can explore the graph for the top result from the previous query (Annie Duncan) out to 3 levels and see how well connected she is. She does appear to sit between several clusters/communities at the edge of this graph. We get even more results if we look farther out than 3 hops, but the results would be harder to visualise and take longer to draw on the screen.
363
428
364
429
[source,cypher]
430
+
----
365
431
MATCH path = (:Person {nhs_no: '863-96-9468', surname: 'Duncan'})-[:KNOWS*..3]-(:Person)
366
432
RETURN path
367
-
433
+
----
368
434
369
435
== End of the guide
436
+
370
437
This was a simplified demo, and a real POLE model populated with actual police data would be much more complicated and rich. However, this was a good way to explore some POLE data modelling and queries in a semi-real world way.
371
438
372
439
To make the demo easier to follow we used 'NHS Number' as simulated unique identifier for Person nodes, though of course in a real-life scenario we probably wouldn't have one consistent identifier and instead would query the graph using a wide range of identifiers, matching criteria, query methods, etc.
0 commit comments