Prerequisites:
Background
With our soil survey data in the graph, we can start inspecting for trends of interest. As the subject is a fictional environment, we can make some rules by which this ‘economy’ is expected to run. We can check for various metrics and look for findings that don’t meet the expected norms. We will consider such discoveries as anomalous data patterns.
For example, given a pattern like
(n:Hort_Client)<-[:SENT_TO]-(:Soil_Report)<-[:ACTIONS]-(:Contractor)-[:OPERATES_IN]->(m:Region)
Let’s say that any horticultural enterprise must not be found to hire contracting firm from more than one Region. In fact, a contracting license specifies that a Contractor can only operate within its own Region and thus must not allow itself to be drawn into deals that would require it to work cross-border.
Another limit we can place is that a Contractor must not extend its operation across some minimum number of clients. THe idea here is that each Contractor must maintain a minimum of familiarity with a given property and can not sacrifice that aspect of its business in preference to simply maximising its income across as many clients as possible but not retaining any detailed property knowledge…
Exploring specific scenarios where unexpected business activities take place
Business rule: a
Hort_Clientcan hire anyContractorbut only from its own and one region
- Let’s find hort firms that have hired contractors
MATCH (n:Hort_Client)<-[:SENT_TO]-(:Soil_Report)<-[:ACTIONS]-(:Contractor)-[:OPERATES_IN]->(m:Region) WITH n.name as hort_name MATCH path = shortestPath((n1:Hort_Client)-[*1..3]-(m1:Region)) WHERE n1.name = hort_name RETURN DISTINCT path;- Output:
-
Hort_Clientnodes with associatedContractornodes. But who are the culprits ignoring the regulations?
- We can take a tabular solution to pin-pointing the related nodes of interest
MATCH (n:Hort_Client)<-[:SENT_TO]-(:Soil_Report)<-[:ACTIONS]-(:Contractor)-[:OPERATES_IN]->(m:Region) WITH n.name as hort_name, n.client as sorting, count(DISTINCT m.name) as no_regions, collect(DISTINCT m.name) AS region_list WHERE no_regions > 1 RETURN hort_name, no_regions, region_list ORDER BY sorting;- Output:
-
-
╒═══════════╤════════════╤════════════════════════════════════╕ │"hort_name"│"no_regions"│"region_list" │ ╞═══════════╪════════════╪════════════════════════════════════╡ │"hc_157" │3 │["Northbury","Swifford","Eastling"] │ ├───────────┼────────────┼────────────────────────────────────┤ │"hc_171" │3 │["Northbury","Swifford","Westshire"]│ └───────────┴────────────┴────────────────────────────────────┘
-
- Let’s use a more incisive graph approach to show us the suspect relationships
MATCH (n:Hort_Client)<-[:SENT_TO]-(:Soil_Report)<-[:ACTIONS]-(:Contractor)-[:OPERATES_IN]->(m:Region) WITH n.name as hort_name, count(DISTINCT m.name) as no_regions WHERE no_regions > 1 WITH hort_name MATCH path = shortestPath((n1:Hort_Client)-[*1..3]-(m1:Region)) WHERE n1.name = hort_name RETURN DISTINCT path;- Output:
-
- it is hc_157 and hc_171 who’ve done cross-border deals

- it is hc_157 and hc_171 who’ve done cross-border deals
Business rule: a
Contractorcan have no more than XHort_Clients
- See contractors and their clients
MATCH (n:Hort_Client)<-[:SENT_TO]-(:Soil_Report)<-[:ACTIONS]-(c:Contractor) WITH c.name as contractor MATCH path = shortestPath((n1:Hort_Client)<-[*1..2]-(c1:Contractor)) WHERE c1.name = contractor RETURN DISTINCT path;- Output:
-
- Displaying
Contractors and theirHort_Clientnodes. But are there any contractors who contravene the rules?
- Displaying
- We can see the related nodes of interest in a table that shows us contractors who worked for more than X clients. For this sample of records, we’ll set X = 1
MATCH (n:Hort_Client)<-[:SENT_TO]-(:Soil_Report)<-[:ACTIONS]-(c:Contractor) WITH c.name as contractor, c.c_id as sorting, count(DISTINCT n.name) as no_clients, collect(DISTINCT n.name) AS client_list WHERE no_clients > 1 RETURN contractor, no_clients, clients ORDER BY sorting;- Output:
-
-
╒═════════════╤════════════╤═══════════════════╕ │"contractor" │"no_clients"│"client_list" │ ╞═════════════╪════════════╪═══════════════════╡ │"contra_1250"│2 │["hc_160","hc_167"]│ └─────────────┴────────────┴───────────────────┘
-
- A more direct graph view will disclose the suspect activities
MATCH (n:Hort_Client)<-[:SENT_TO]-(:Soil_Report)<-[:ACTIONS]-(c:Contractor) WITH c.name as contractor, count(DISTINCT n.name) as no_clients WHERE no_clients > 1 WITH contractor MATCH path = shortestPath((n1:Hort_Client)<-[*1..2]-(c1:Contractor)) WHERE c1.name = contractor RETURN DISTINCT path;- Output:
-
- it is contra_1250 who is moonshining for another
Hort_Client
- it is contra_1250 who is moonshining for another
We have defined some business rules by which the data must play. We used Cypher queries to figure out where those rules are broken