Skip to content

Commit 73f87b8

Browse files
committed
Fixes in guide
1 parent d3ec7c6 commit 73f87b8

File tree

1 file changed

+78
-11
lines changed

1 file changed

+78
-11
lines changed

documentation/pole.adoc

+78-11
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,9 @@ __All scenarios and persons portrayed in this demo are fictitious. Any similari
6464
Review the metagraph, and see the types of nodes and relationships we're going to be working with.
6565

6666
[source,cypher]
67+
----
6768
call db.schema.visualization()
69+
----
6870

6971
Notice the different ways that Persons can be related to each other. There is a general 'KNOWS' relationship, as well as more specific relationship types: FAMILY_REL (related to), KNOWS_LW (lives with), KNOWS_PHONE (has a related phone call), and KNOWS_SN (social network).
7072

@@ -75,9 +77,11 @@ Notice also that Location is associated to both Postcode and Area. In the UK, P
7577

7678
Let's have a look at the types of crimes in the graph, and the number of times each occurred:
7779
[source,cypher]
80+
----
7881
MATCH (c:Crime)
7982
RETURN c.type AS crime_type, count(c) AS total
8083
ORDER BY count(c) DESC
84+
----
8185

8286
You should see that 'Violence and sexual offences' was the highest category of crimes for the month, with weapons offences being the category with the lowest count.
8387

@@ -87,18 +91,22 @@ You should see that 'Violence and sexual offences' was the highest category of c
8791
Let's also look at the top locations in the graph where crimes have been recorded:
8892

8993
[source,cypher]
94+
----
9095
MATCH (l:Location)<-[:OCCURRED_AT]-(:Crime)
9196
RETURN l.address AS address, l.postcode AS postcode, count(l) AS total
9297
ORDER BY count(l) DESC
9398
LIMIT 15
99+
----
94100

95101
You should see several obvious public places and institutions with high numbers of crime associated - Piccadilly (the area near the main rail station in Manchester), a Shopping Area (and a nearby Prison), etc. There are some residential-looking addresses towards the bottom of the list with pretty high numbers (i.e. 35 crimes at both 182 Waterson Avenue and 43 Walker's Croft).
96102

97103
== General Queries
98104
=== Crimes near a particular address
105+
99106
The popular UK television drama 'Coronation Street' is set in a fictional Manchester-area neighbourhood. There's a Coronation Street address in the graph (1 Coronation Street, home of the Barlow family in the show). Using the longitude and latitude properties in our Location nodes we can do a distance-based search to find crimes that are within 500 metres of this address.
100107

101108
[source,cypher]
109+
----
102110
MATCH (l:Location {address: '1 Coronation Street', postcode: 'M5 3RW'})
103111
WITH point(l) AS corrie
104112
MATCH (x:Location)-[:HAS_POSTCODE]->(p:PostCode),
@@ -108,22 +116,31 @@ WHERE distance < 500
108116
RETURN x.address AS address, p.code AS postcode, count(c) AS crime_total, collect(distinct(c.type)) AS crime_type, distance
109117
ORDER BY distance
110118
LIMIT 10
119+
----
111120

112121
== General Queries
113122
=== Crimes investigated by Inspector Morse
123+
114124
Another popular UK television drama is 'Inspector Morse'. There's also an Inspector Morse in our graph - let's see what Crimes he is investigating.
125+
115126
[source,cypher]
127+
----
116128
MATCH (o:Officer {rank: 'Chief Inspector', surname: 'Morse'})<-[i:INVESTIGATED_BY]-(c:Crime)
117129
RETURN *
130+
----
118131

119132
You should see quite a number of Crime nodes connected by the INVESTIGATED_BY relationship to the Inspector Morse node. Take a few minutes to click on some of them to expand the graph and see what other nodes are related to some of these Crimes.
120133

121134
== Crime Investigation
122135
=== Crimes under investigation by Officer Larive
136+
123137
Let's say we are interested in the crimes that are under investigation by Police Constable Devy Larive (Badge Number 26-5234182).
138+
124139
[source,cypher]
140+
----
125141
MATCH (c:Crime {last_outcome: 'Under investigation'})-[i:INVESTIGATED_BY]->(o:Officer {badge_no: '26-5234182', surname: 'Larive'})
126142
return *
143+
----
127144

128145
We can see Police Constable Larive is investigating a number of crimes at the moment. In particular we can see that PC Larive is investigating three Drugs Crimes. Double clicking on these three Drugs crimes shows us:
129146

@@ -135,9 +152,11 @@ We could click on these nodes and manually explore the graph to get more informa
135152

136153
== Crime Investigation
137154
=== Shortest path between persons related to crimes
155+
138156
Let's see if the two Persons - Jack Powell and Raymond Walker - associated with these three Drugs Crimes are somehow connected in the graph. We'll look for all of the shortest paths between them of 3 or fewer hops along all types of 'KNOWS' relationships. We can ignore the direction of the relationships in this query, as we're not interested in which direction they point.
139157

140158
[source,cypher]
159+
----
141160
MATCH (c:Crime {last_outcome: 'Under investigation', type: 'Drugs'})-[:INVESTIGATED_BY]->(:Officer {badge_no: '26-5234182'}),
142161
(c)<-[:PARTY_TO]-(p:Person)
143162
WITH COLLECT(p) AS persons
@@ -146,16 +165,20 @@ UNWIND persons AS p2
146165
WITH * WHERE id(p1) < id(p2)
147166
MATCH path = allshortestpaths((p1)-[:KNOWS|KNOWS_LW|KNOWS_SN|FAMILY_REL|KNOWS_PHONE*..3]-(p2))
148167
RETURN path
168+
----
149169

150170
It turns out they are part of what looks like a social group. Two of Raymond's family relations (his father Phillip and sister Kathleen) know Alan Ward, who is the brother of Jack Powell. Raymond's father Phillip also lives with Jack's father Brian. Knowing that Raymond is under investigation for production of cannabis, that Jack is under investigation for two separate charges of possession of cannabis with intent to supply, and that they seem to be part of a social group we can speculate it's possible that they know each other and that Jack is getting his cannabis from Raymond.
151171

152172
== Crime Investigation
153173
=== Other related people associated with drugs crimes
174+
154175
To build an even stronger case let's look at the social networks of Jack Powell and Raymond Walker, and see if anyone else within 3 hops of them along 'KNOWS' relationships is also related to a Drugs Crime.
155176

156177
[source,cypher]
178+
----
157179
MATCH path = (:Officer {badge_no: '26-5234182'})<-[:INVESTIGATED_BY]-(:Crime {type: 'Drugs'})<-[:PARTY_TO]-(:Person)-[:KNOWS*..3]-(:Person)-[:PARTY_TO]->(:Crime {type: 'Drugs'})
158180
RETURN path
181+
----
159182

160183
This query reveals an interesting and somewhat dense social network, including family relations and people who live with one another. Reviewing the graph we can see:
161184

@@ -179,36 +202,45 @@ We might also be able to infer some additional relationships in this graph:
179202
Now we can explore a series of queries to simulate research on 'vulnerable' or 'at risk' individuals in the graph. This might be especially important in a social services or child protection use case. Here we have defined 'vulnerable person' as someone who is not themselves associated to a crime, but who knows many people who are. Run the query below to generate a list of the Top 5 most vulnerable people in the graph.
180203

181204
=== Top 5 vulnerable people in the graph
205+
182206
[source,cypher]
207+
----
183208
MATCH (p:Person)-[:KNOWS]-(friend)-[:PARTY_TO]->(:Crime)
184209
WHERE NOT (p:Person)-[:PARTY_TO]->(:Crime)
185210
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, count(distinct friend) AS dangerousFriends
186211
ORDER BY dangerousFriends DESC
187212
LIMIT 5
213+
----
188214

189215
We will be referring to this list of Vulnerable people throughout the next few steps, so you may want to keep the results handy (try using the tack icon to pin them to the top).
190216

191217
== Vulnerable Persons Investigation
192218
=== Friends of Friends
219+
193220
Using Cypher it's then very easy to explore the graph out through a wider social circle. A small change to the query allows us to see not only friends of individuals who are associated with crimes, but also 'friends of friends' who are associated with crimes as well.
194221

195222
[source,cypher]
223+
----
196224
MATCH (p:Person)-[:KNOWS*1..2]-(friend)-[:PARTY_TO]->(:Crime)
197225
WHERE NOT (p:Person)-[:PARTY_TO]->(:Crime)
198226
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, count(distinct friend) AS dangerousFriends
199227
ORDER BY dangerousFriends DESC
200228
LIMIT 5
229+
----
201230

202231
Try modifying the query to look at 'friends of friends of friends' (3 'KNOWS' relationships out) and see how that changes the results.
203232

204233

205234
== Vulnerable Persons Investigation
206235
=== Exploring a Vulnerable Person's graph
236+
207237
Let's explore the graph for the top result from our original Vulnerable Persons results (which, hopefully, you've pinned in a previous step).
208238

209239
[source,cypher]
240+
----
210241
MATCH path = (:Location)<-[:CURRENT_ADDRESS]-(:Person {nhs_no: '804-54-6976', surname: 'Freeman'})-[:KNOWS]-(:Person)-[:PARTY_TO]->(:Crime)
211242
RETURN path
243+
----
212244

213245
We can see that Anne Freeman has 8 dangerous friends. Using her ID, this query shows us the graph of these friends, which we can navigate and explore.
214246

@@ -221,19 +253,23 @@ You can also try updating this query to show 'friends of friends' or 'friends of
221253
Now that we've seen Anne Freeman's social circle, it would be good to know whether any of her dangerous friends is actually local to her (in her area, or neighbourhood).
222254

223255
[source,cypher]
256+
----
224257
MATCH (anne:Person {nhs_no: '804-54-6976', surname: 'Freeman'})-[k:KNOWS]-(friend)-[pt:PARTY_TO]->(c:Crime),
225258
(anne)-[ca1:CURRENT_ADDRESS]->(aAddress)-[lia1:LOCATION_IN_AREA]->(area),
226259
(friend)-[ca2:CURRENT_ADDRESS]->(fAddress)-[lia2:LOCATION_IN_AREA]->(area)
227260
RETURN *
261+
----
228262

229263

230264
We can see it's only her friend Craig, who she knows through social networks, that lives in the same Area (SK1) as Anne. Craig has been associated with two Public Order offences.
231265

232266
== Vulnerable Persons Investigation
233267
=== Looking for connections between Vulnerable Persons
268+
234269
Going back to the list of vulnerable people, let's see if any of them are connected. This query takes the results of the vulnerable people query and looks for paths of 'KNOWS' relationships that connect them.
235270

236271
[source,cypher]
272+
----
237273
MATCH (p:Person)-[:KNOWS]-(friend)-[:PARTY_TO]->(:Crime)
238274
WHERE NOT (p:Person)-[:PARTY_TO]->(:Crime)
239275
WITH p, count(distinct friend) AS dangerousFriends
@@ -245,49 +281,58 @@ UNWIND people AS p2
245281
WITH * WHERE id(p1) <> id (p2)
246282
MATCH path = shortestpath((p1)-[:KNOWS*]-(p2))
247283
RETURN path
284+
----
248285

249286
It turns out there are connections between them, of different lengths. There are actually multiple paths by which some of them are connected.
250287

251288
We're finished now with the original list of vulnerable people and those results can be closed or unpinned.
252289

253290
== Vulnerable Persons Investigation
254291
=== Looking for Dangerous Family Friends
292+
255293
We can now write another query looking for vulnerable or at risk individuals, but this time based on their family relationships rather than their direct social relationships. We'll look for people who are not directly related to a crime, and neither is their relative, but their relative has dangerous friends.
256294

257295
[source,cypher]
296+
----
258297
MATCH (p:Person)-[:FAMILY_REL]-(relative)-[:KNOWS]-(famFriend)-[:PARTY_TO]->(:Crime)
259298
WHERE NOT (p:Person)-[:PARTY_TO]->(:Crime) AND
260299
NOT (relative)-[:PARTY_TO]->(:Crime)
261300
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, count(DISTINCT famFriend) AS DangerousFamilyFriends
262301
ORDER BY DangerousFamilyFriends DESC
263302
LIMIT 5
303+
----
264304

265305
You should see 5 people who have family members with dangerous friends.
266306

267307
== Vulnerable Persons Investigation
268308
=== Looking for Dangerous Family Friends
309+
269310
The previous query returned a good set of at risk individuals. However, it's probably not specific enough - it would be more interesting to see this list with an additional requirement that the vulnerable individuals live with their relative who has dangerous friends.
270311

271312

272313
[source,cypher]
314+
----
273315
MATCH (p:Person)-[:FAMILY_REL]-(relative)-[:KNOWS]-(famFriend)-[:PARTY_TO]->(:Crime),
274316
(p)-[:CURRENT_ADDRESS]->(:Location)<-[:CURRENT_ADDRESS]-(relative)
275317
WHERE NOT (p:Person)-[:PARTY_TO]->(:Crime) AND
276318
NOT (relative)-[:PARTY_TO]->(:Crime)
277319
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, count(DISTINCT famFriend) AS DangerousFamilyFriends
278320
ORDER BY DangerousFamilyFriends DESC
279321
LIMIT 5
322+
----
280323

281324
This version of the query returns only 2 people, but the one with the highest number of dangerous family friends (Kimberly Alexander) is the same as from the results of the previous query.
282325

283326
== Vulnerable Persons Investigation
284327
=== Exploring a Vulnerable Person's graph
328+
285329
We can view Kimberley's graph, and see that Kimberly (age 12) lives with her mother Bonnie at 53 Ridge Grove. Bonnie has several friends who are related to a number of crimes of varying types. There's a high chance that Kimberly is being exposed to these people, potentially putting her at risk.
286330

287331
[source,cypher]
332+
----
288333
MATCH path = (relative:Person)-[:CURRENT_ADDRESS]->(:Location)<-[:CURRENT_ADDRESS]-(:Person {nhs_no: '548-59-5017', surname: 'Alexander'})-[:FAMILY_REL]-(relative)-[:KNOWS]-(:Person)-[:PARTY_TO]->(:Crime)
289334
RETURN path
290-
335+
----
291336

292337
== Graph Algorithms
293338
=== Triangle Count
@@ -299,74 +344,96 @@ The triangle count algorithm returns 'triangles' of connected nodes - in this ca
299344
Run the following query to identify Person nodes in our graph who are members of the highest number of triangles.
300345

301346
[source,cypher]
302-
CALL algo.triangleCount.stream('Person', 'KNOWS', {concurrency:4})
303-
YIELD nodeId, triangles
347+
----
348+
CALL gds.triangleCount.stream(
349+
{nodeProjection:'Person',
350+
relationshipProjection:{KNOWS:{type:'KNOWS',orientation:'UNDIRECTED'}}})
351+
YIELD nodeId, triangleCount as triangles
304352
MATCH (p:Person)
305353
WHERE ID(p) = nodeId AND
306354
triangles > 0
307355
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, triangles
308356
ORDER BY triangles DESC
309357
LIMIT 10;
358+
----
310359

311360
== Algorithms
312361
=== Triangle Count
362+
313363
We can take a look at the graph for one of the sets of triangles that was returned - Deborah Ford, who belongs to ten triangles.
314364

315365
We can see that Patricia Carr knows both Deborah Ford and Jonathan Hunt, and both Deborah and Jonathan know Peter Bryant, Harry Lopez, and Phillip Perry. We can might therefore infer that Patricia knows Peter, Harry, and Phillip as well.
316366

317367
[source,cypher]
368+
----
318369
MATCH path = (p1:Person {nhs_no: '838-45-9343', surname: 'Ford'})-[:KNOWS]-(p2)-[:KNOWS]-(p3)-[:KNOWS]-(p1)
319370
RETURN path
371+
----
320372

321373

322374
== Algorithms
323375
=== Triangle Count on a Subgraph
376+
324377
The previous query was interesting, but we ran it against the entire graph. We can use the same algorithm on a sub-graph - for instance, only people who associated with crimes. This returns a different set of triangles, consisting only of people associated with crimes who appear in communities/clusters.
325378

326379
[source,cypher]
327-
CALL algo.triangleCount.stream('MATCH (p:Person)-[:PARTY_TO]->(c:Crime) RETURN id(p) AS id', 'MATCH (p1:Person)-[:KNOWS]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target', {concurrency:4, graph:'cypher'})
328-
YIELD nodeId, triangles
380+
----
381+
CALL gds.triangleCount.stream(
382+
{nodeQuery:'MATCH (p:Person) WHERE exists { (n)-[:PARTY_TO]->(:Crime) } RETURN id(p) AS id',
383+
relationshipQuery:'MATCH (p1:Person)-[:KNOWS]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target'})
384+
YIELD nodeId, triangleCount as triangles
329385
MATCH (p:Person)
330386
WHERE ID(p) = nodeId AND
331387
triangles > 0
332388
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, triangles
333389
ORDER BY triangles DESC
334390
LIMIT 5;
335-
391+
----
336392

337393
== Algorithms
338394
=== Triangle Count on a Subgraph
395+
339396
Looking at the triangles associated to one of the top results from the previous query (Phillip Williamson) shows an interesting group of people who know each other, are related to each other, and/or live with each other. The names look familiar from our previous Drugs investigation - we have quite a group of potential criminals here. In addition to the Drugs Crimes there are a lot of Vehicle Crimes associated with this social group. Perhaps this is a gang which specialises in car theft. It's interesting to note how the algorithms automatically turned up something we needed to specifically search for earlier (during our Drugs search we had specific Officer and Person starting nodes from our search).
340397

341398
[source,cypher]
399+
----
342400
MATCH (p1:Person {nhs_no: '337-28-4424', surname: 'Williamson'})-[k1:KNOWS]-(p2)-[k2:KNOWS]-(p3)-[k3:KNOWS]-(p1)
343401
WITH *
344402
MATCH (person)-[pt:PARTY_TO]->(crime) WHERE person IN[p1, p2, p3]
345403
RETURN *
404+
----
346405

347406
== Algorithms
348407
=== Betweenness Centrality
408+
349409
The betweenness algorithm measures centrality in the graph - a way of identifying the most important nodes in a graph. It does this by identifying nodes which sit on the shortest path between many other nodes and scoring them more highly. We can see the people here which are potentially important in the graph by using this measure - they sit on the shortest path between the most other people via the 'KNOWS' relationship (ignoring relationships direction, as it's not very important here). Information and resources tend to flow along the shortest paths in a graph, so this is one good way of identifying central nodes or 'bridge' nodes between communities in the graph.
350410

351411
[source,cypher]
352-
CALL algo.betweenness.stream('Person', 'KNOWS', {direction: 'both'})
353-
YIELD nodeId, centrality
412+
----
413+
CALL gds.betweenness.stream({
414+
nodeProjection:'Person',
415+
relationshipProjection:{KNOWS:{type:'KNOWS',orientation:'UNDIRECTED'}}})
416+
YIELD nodeId, score AS centrality
354417
MATCH (p:Person)
355418
WHERE ID(p) = nodeId
356-
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, toInt(centrality) AS score
419+
RETURN p.name AS name, p.surname AS surname, p.nhs_no AS id, toInteger(centrality) AS score
357420
ORDER BY centrality DESC
358421
LIMIT 10;
359-
422+
----
423+
360424
== Algorithms
361425
=== Betweenness Centrality
426+
362427
We can explore the graph for the top result from the previous query (Annie Duncan) out to 3 levels and see how well connected she is. She does appear to sit between several clusters/communities at the edge of this graph. We get even more results if we look farther out than 3 hops, but the results would be harder to visualise and take longer to draw on the screen.
363428

364429
[source,cypher]
430+
----
365431
MATCH path = (:Person {nhs_no: '863-96-9468', surname: 'Duncan'})-[:KNOWS*..3]-(:Person)
366432
RETURN path
367-
433+
----
368434

369435
== End of the guide
436+
370437
This was a simplified demo, and a real POLE model populated with actual police data would be much more complicated and rich. However, this was a good way to explore some POLE data modelling and queries in a semi-real world way.
371438

372439
To make the demo easier to follow we used 'NHS Number' as simulated unique identifier for Person nodes, though of course in a real-life scenario we probably wouldn't have one consistent identifier and instead would query the graph using a wide range of identifiers, matching criteria, query methods, etc.

0 commit comments

Comments
 (0)