Cypher is a declarative language for querying and manipulating Neo4j graph databases. Essentially, Cypher is to Neo4j graphs what SQL is to relational database systems. Cypher’s functionality is expanding and improving with every version, so keep your eye on its Neo4j Cypher Refcard resource page[1]
Hort_Client
nodes// import Hort_Client nodes
CREATE INDEX ON :Hort_Client(client);
CREATE INDEX ON :Hort_Client(name);
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///soil_survey.csv" AS line
WITH line LIMIT 10000
MERGE (hc:Hort_Client {client: line.Hort_Client, name: 'hc_' + line.Hort_Client});
Hort_Client
nodesCREATE INDEX
- adds an index for each property of the node. Note that we have two, client
and name
USING PERIODIC COMMIT 1000
- every 1000 records/lines are treated as a single transaction after which those records will be written to diskLOAD CSV WITH HEADERS FROM
- ‘soil_survey.csv’ file has headers that we added manually using sed
LIMIT 10000
- a maximum number of lines that you wish to import. Even if there are more lines in the file, the import will stop at the limit. This is also good for testing, if you don’t want to load that 200M+ record file just yetMERGE
- since there are many instances of the Hort_Client
inside the import file, we only want to create a single unique nodeSoil_Service
nodes ```sql // import Soil_Service nodes CREATE INDEX ON :Soil_Service(ss_id); CREATE INDEX ON :Soil_Service(name);USING PERIODIC COMMIT 1000 LOAD CSV WITH HEADERS FROM “file:///soil_survey.csv” AS line WITH line LIMIT 10000 MERGE (ss:Soil_Service {ss_id: line.Soil_Service, name: ‘_’ + line.Soil_Service});
Also:
: - as above but import `Soil_Service` nodes
3. Create `REQUESTS` relationship between `Hort_Client` and `Soil_Service`
```sql
// Hort_Client-->Soil_Service
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///soil_survey.csv" AS line
WITH line LIMIT 10000
MATCH (hc:Hort_Client {client: line.Hort_Client})
MATCH (ss:Soil_Service {ss_id: line.Soil_Service})
MERGE (hc)-[:REQUESTS]->(ss);
MATCH
- locate Hort_Client
and Soil_Service
nodes whose properties match the values of the current file line being read inMERGE (hc)-[:REQUESTS]->(ss)
- once the two above nodes are found, create a relationship between them, labelled REQUESTS
The resulting Cypher file will be a series of statements that will index node properties, create nodes and build relationships between them.
NB: Ensure you’ve parked your CSV file in ~/neo4j/import
and Neo4j service is running
Here we can explore different aspects of the CSV file about to be imported. You can use the following Cypher queries that you can execute inside your Neo4j Browser.
LOAD CSV WITH HEADERS FROM "file:///soil_survey_sample.csv" AS line
WITH line
RETURN count(line);
╒════════════╕
│"line_count"│
╞════════════╡
│2837 │
└────────────┘
LOAD CSV WITH HEADERS FROM "file:///soil_survey_sample.csv" AS line
WITH line LIMIT 1 RETURN line as fields_and_values;
{
"Solution": "5397",
"Soil_Service": "54593",
"Region": "Northbury",
"Contractor": "1091",
"Soil_Issue": "Erosion",
"Date_Reported": "2007-05-07",
"DaysToAction": "287",
"Date_Actioned": "2008-02-18",
"Locality": "3656",
"Hort_Client": "159"
}
LOAD CSV WITH HEADERS FROM "file:///soil_survey_sample.csv" AS line
WITH line LIMIT 1
RETURN line.Hort_Client as Hort_Client, line.Soil_Service as Soil_Service, line.Soil_Issue as Soil_Issue, line.Solution as Solution, line.Date_Reported as Date_Reported, line.Date_Actioned as Date_Actioned, line.DaysToAction as DaysToAction, line.Contractor as Contractor, line.Locality as Locality, line.Region as Region;
╒═════════════╤══════════════╤════════════╤══════════╤═══════════════╤═══════════════╤══════════════╤════════════╤══════════╤═══════════╕
│"Hort_Client"│"Soil_Service"│"Soil_Issue"│"Solution"│"Date_Reported"│"Date_Actioned"│"DaysToAction"│"Contractor"│"Locality"│"Region" │
╞═════════════╪══════════════╪════════════╪══════════╪═══════════════╪═══════════════╪══════════════╪════════════╪══════════╪═══════════╡
│"159" │"54593" │"Erosion" │"5397" │"2007-05-07" │"2008-02-18" │"287" │"1091" │"3656" │"Northbury"│
└─────────────┴──────────────┴────────────┴──────────┴───────────────┴───────────────┴──────────────┴────────────┴──────────┴───────────┘
LOAD CSV WITH HEADERS FROM "file:///soil_survey_sample.csv" AS line
WITH line
WITH line.Hort_Client as client, line.Soil_Service as service, line.Soil_Issue as issue, line.Solution as solution, line.Locality as locality, line.Region as region
RETURN count(DISTINCT client) as Hort_Client, count(DISTINCT service) as Soil_Service, count(DISTINCT issue) as Soil_Issue, count(DISTINCT solution) as Solution, count(DISTINCT locality) as Locality, count(DISTINCT region) as Region;
╒═════════════╤══════════════╤════════════╤══════════╤══════════╤════════╕
│"Hort_Client"│"Soil_Service"│"Soil_Issue"│"Solution"│"Locality"│"Region"│
╞═════════════╪══════════════╪════════════╪══════════╪══════════╪════════╡
│21 │2670 │12 │255 │1925 │4 │
└─────────────┴──────────────┴────────────┴──────────┴──────────┴────────┘
DaysToAction
LOAD CSV WITH HEADERS FROM "file:///soil_survey_sample.csv" AS line
WITH line
WITH toInt(line.DaysToAction) as action_delay
RETURN min(action_delay) as min_delay, max(action_delay) as max_delay, avg(action_delay) as mean_delay, stDev(action_delay) as std_dev,
percentileDisc(action_delay, 0.25) as _25percentile, percentileDisc(action_delay, 0.5) as _50percentile, percentileDisc(action_delay, 0.75) as _75percentile, percentileDisc(action_delay, 0.9) as _90percentile;
╒═══════════╤═══════════╤══════════════════╤═════════════════╤═══════════════╤═══════════════╤═══════════════╤═══════════════╕
│"min_delay"│"max_delay"│"mean_delay" │"std_dev" │"_25percentile"│"_50percentile"│"_75percentile"│"_90percentile"│
╞═══════════╪═══════════╪══════════════════╪═════════════════╪═══════════════╪═══════════════╪═══════════════╪═══════════════╡
│-140 │365 │103.14698625308418│77.27355038508219│49 │84 │133 │216 │
└───────────┴───────────┴──────────────────┴─────────────────┴───────────────┴───────────────┴───────────────┴───────────────┘
min_delay
indicates an error in data, where Date_Actioned
occurs before Date_Reported
. How about inspecting the data and fixing this error before importing it into the graph?LOAD CSV WITH HEADERS FROM "file:///soil_survey_sample.csv" AS line
WITH line SKIP toInteger(100*rand())+ 1 LIMIT 5
RETURN line.Hort_Client as Hort_Client, line.Soil_Service as Soil_Service, line.Soil_Issue as Soil_Issue, line.Solution as Solution, line.Date_Reported as Date_Reported, line.Date_Actioned as Date_Actioned, line.DaysToAction as DaysToAction, line.Contractor as Contractor, line.Locality as Locality, line.Region as Region;
╒═════════════╤══════════════╤════════════════╤══════════╤═══════════════╤═══════════════╤══════════════╤════════════╤══════════╤═══════════╕
│"Hort_Client"│"Soil_Service"│"Soil_Issue" │"Solution"│"Date_Reported"│"Date_Actioned"│"DaysToAction"│"Contractor"│"Locality"│"Region" │
╞═════════════╪══════════════╪════════════════╪══════════╪═══════════════╪═══════════════╪══════════════╪════════════╪══════════╪═══════════╡
│"170" │"3796" │"HighAlkalinity"│"766" │"2008-02-18" │"2008-07-08" │"141" │"2295" │"2616" │"Eastling" │
├─────────────┼──────────────┼────────────────┼──────────┼───────────────┼───────────────┼──────────────┼────────────┼──────────┼───────────┤
│"170" │"2135" │"Erosion" │"2104" │"2008-02-18" │"2008-11-18" │"274" │"2295" │"1471" │"Eastling" │
├─────────────┼──────────────┼────────────────┼──────────┼───────────────┼───────────────┼──────────────┼────────────┼──────────┼───────────┤
│"159" │"52067" │"HighAlkalinity"│"765" │"2008-02-25" │"2008-05-20" │"85" │"1091" │"3487" │"Northbury"│
├─────────────┼──────────────┼────────────────┼──────────┼───────────────┼───────────────┼──────────────┼────────────┼──────────┼───────────┤
│"159" │"6116" │"HighAlkalinity"│"765" │"2008-02-25" │"2008-06-24" │"120" │"1091" │"409" │"Northbury"│
├─────────────┼──────────────┼────────────────┼──────────┼───────────────┼───────────────┼──────────────┼────────────┼──────────┼───────────┤
│"160" │"6241" │"Erosion" │"8775" │"2008-02-25" │"2008-04-15" │"50" │"1250" │"2777" │"Eastling" │
└─────────────┴──────────────┴────────────────┴──────────┴───────────────┴───────────────┴──────────────┴────────────┴──────────┴───────────┘
You have a background understanding of Cypher and how its statements related to the graph model, plus you have been able to get a preliminary peak at the structure of the CSV file you are about to import