Stitching the Graph: Saving Knowledge to Neo4j

How our documents and entities are stored and visualized as a graph.

So far, we’ve extracted clean text from ecological reports and used AI to identify important entities like organizations, places, and ecological concepts.

But identifying entities is only half the story. Now we need to connect them — to build our knowledge graph.

In this post, we show how we use Neo4j, a graph database, to stitch everything together into a network of knowledge.

What is a Graph Database?

A graph database stores information as nodes (things) and relationships (connections).

Unlike traditional databases, which store rows and columns, a graph lets you explore relationships:

(WWF Report) –MENTIONS–> (Amazon rainforest)
–CITES—–> (IPBES 2019 Report)
–MENTIONS–> (climate resilience)

Neo4j is the most popular open-source graph database. It’s:

Visual
"Easy" to query (using a language called Cypher)
Designed to handle relationships efficiently

What We Store in the Graph

Each document and entity we extract becomes a node, and their connections are modeled as relationships.

Node Types

:Document — the report itself
:Entity — an extracted item (e.g. “WWF”, “Amazon rainforest”)

Each node has properties like:

(:Document {name: "wwf_amazon_report", lang: "en"})
(:Entity {text: "climate resilience", label: "ECO_CONCEPT"})

Relationship Types

(:Document)-[:MENTIONS]->(:Entity)
(:Document)-[:CITES]->(:Document)

These relationships make the knowledge navigable, not just searchable.

How It Works in Code

After entity tagging, we load the data into Neo4j using a python script:

python graph_upload.py my-report.entities.json

This script:

Loads the entity JSON file
Creates the :Document node
Creates one :Entity node per unique mention
Connects them with :MENTIONS relationships

You can view this graph in Neo4j’s browser:

http://localhost:7474

Use the default login:

Username: neo4j
Password: password

And run a query like:

MATCH (d:Document)-[:MENTIONS]->(e:Entity)
RETURN d, e

Try It Yourself

If you’ve run the full pipeline:

make pipeline PDF=/data/input/sample1.pdf

The upload to Neo4j is done automatically. Otherwise, you can run it directly:

docker compose exec worker python graph_upload.py my-report.entities.json

Then open Neo4j in your browser and start exploring the graph!

What Can You Do With It?

Once in the graph, you can:

Find all documents mentioning a specific concept
Discover what organizations work in similar regions
Visualize themes and citations across languages
Prepare data for annotation or deeper AI training

This is where our static data becomes living knowledge.

What’s Next?

Now that our knowledge is structured and searchable, we’ll look at how to export the data to Doccano, a simple annotation tool that lets humans teach the system to improve over time.