Graphs in JSON

It seems to me that what we ought to be starting with, here, is a clear,
unencumbered, useful, standard way of representing graphs in JSON. Linking,
merging, mapping, federation: all this stuff comes later, in layers, over a
solid core graph-represenatation. First we need to do for graphs what CSV
did for tables. We don't *have* to do it in JSON, but I don't see why we
shouldn't.

I also believe a few more things that may be non-obvious and/or
controversial:

- lists are a native logical data construct, and should be integral to a
graph representation
- graphs describe the relationships between concepts; literals describe the
assignment of symbols to those concepts: these are two fundamentally
different frames of reference, and shouldn't be intermingled
- eliminating any uncertainty about the directionality of relationships, for
the consumer of a graph, is worth imposing the assumption/burden of
bi-directional relationship-maintainance on the underlying data system

So here's the JSON approach this leads me to. A dataset is a bunch of data
points, and each point is a JSON object like this:

{
  "ID": 102,
  "Name": "Nightwish",
  "Arcs": {
    "Type": [5],
    "Album": [134,167,189,203,214],
    "Genre": [74],
    "MusicBrainz ID": [540]
  }
}

So:
- each point has a numeric (relative) ID
- each point has an optional Name literal (and might have other literals
like a machine-readable Value, alternate languages, etc.)
- each point has a set of arcs
- each arc has a name and an ordered list of target-points, specified by
(relative) ID
- every data point has "Type" among its arcs
- every conceptual entity is represented by a data point; note that this
includes the MusicBrainz ID here: 540 is the ID of the local data-point
representing that external ID, which might in turn look like this:

{
  "ID": 540,
  "Name": "00a9f935-ba93-4fc8-a33a-993abe9c936b",
  "Arcs": {
    "Type": [46],
    "Artist": [102]
  }
}

Requiring this to be a point ensures that it has a type, and a local ID, and
can thus be differentiated, both structurally and individually, from any
other use of that same string.

A simple dataset is then just an array of its points. That's it. Now we can
share graphs.

A more complex dataset might embed that array in a metadata object with some
other context, like:

- a base IRI, for turning these local IDs into IRIs
- some mappings of these local arc-names (usually scoped by type) to IRIs,
like Artist.Album to <http://musicgeek.com/ontology/1.0/release>

but that's all external to the graph itself, and can be discarded or
replaced by the consumer of the data (e.g., I want to map that "Album" arc
to <http://musicnerd.com/datamodel/collection> instead).



Notes and Disclaimers:

- I take no offense if people are attached to precedents that this approach
discards, and thus consider it unresponsive.
- I'm wrting this in my personal capacity as an interested bystander. I'm
not on any standards committees and am not expressing a corporate point of
view.
- But I'm the same person when I go to work, which in my case is at Google
(via the recent acquisition of ITA Software), where the scheme I describe
here is mostly the same as the data-model/JSON-representation used by Needle
(www.needlebase.com), the graph-database project for which I'm the designer.

glenn

Received on Tuesday, 17 May 2011 16:16:40 UTC