Re: Graphs in JSON

glenn mcdonald wrote:
> It seems to me that what we ought to be starting with, here, is a clear,
> unencumbered, useful, standard way of representing graphs in JSON. Linking,
> merging, mapping, federation: all this stuff comes later, in layers, over a
> solid core graph-represenatation. First we need to do for graphs what CSV
> did for tables. We don't *have* to do it in JSON, but I don't see why we
> shouldn't.
> 
> I also believe a few more things that may be non-obvious and/or
> controversial:
> 
> - lists are a native logical data construct, and should be integral to a
> graph representation
> - graphs describe the relationships between concepts; literals describe the
> assignment of symbols to those concepts: these are two fundamentally
> different frames of reference, and shouldn't be intermingled
> - eliminating any uncertainty about the directionality of relationships, for
> the consumer of a graph, is worth imposing the assumption/burden of
> bi-directional relationship-maintainance on the underlying data system
> 
> So here's the JSON approach this leads me to. A dataset is a bunch of data
> points, and each point is a JSON object like this:
> 
> {
>   "ID": 102,
>   "Name": "Nightwish",
>   "Arcs": {
>     "Type": [5],
>     "Album": [134,167,189,203,214],
>     "Genre": [74],
>     "MusicBrainz ID": [540]
>   }
> }

Just a quick question, how would you handle Arcs that are also points, 
for example the following in RDF

   </foo> </bar> "Baz" .
   </bar> x:label "Bar" .

?

> So:
> - each point has a numeric (relative) ID
> - each point has an optional Name literal (and might have other literals
> like a machine-readable Value, alternate languages, etc.)
> - each point has a set of arcs
> - each arc has a name and an ordered list of target-points, specified by
> (relative) ID
> - every data point has "Type" among its arcs
> - every conceptual entity is represented by a data point; note that this
> includes the MusicBrainz ID here: 540 is the ID of the local data-point
> representing that external ID, which might in turn look like this:
> 
> {
>   "ID": 540,
>   "Name": "00a9f935-ba93-4fc8-a33a-993abe9c936b",
>   "Arcs": {
>     "Type": [46],
>     "Artist": [102]
>   }
> }
> 
> Requiring this to be a point ensures that it has a type, and a local ID, and
> can thus be differentiated, both structurally and individually, from any
> other use of that same string.
> 
> A simple dataset is then just an array of its points. That's it. Now we can
> share graphs.
> 
> A more complex dataset might embed that array in a metadata object with some
> other context, like:
> 
> - a base IRI, for turning these local IDs into IRIs
> - some mappings of these local arc-names (usually scoped by type) to IRIs,
> like Artist.Album to <http://musicgeek.com/ontology/1.0/release>
> 
> but that's all external to the graph itself, and can be discarded or
> replaced by the consumer of the data (e.g., I want to map that "Album" arc
> to <http://musicnerd.com/datamodel/collection> instead).
> 
> 
> 
> Notes and Disclaimers:
> 
> - I take no offense if people are attached to precedents that this approach
> discards, and thus consider it unresponsive.
> - I'm wrting this in my personal capacity as an interested bystander. I'm
> not on any standards committees and am not expressing a corporate point of
> view.
> - But I'm the same person when I go to work, which in my case is at Google
> (via the recent acquisition of ITA Software), where the scheme I describe
> here is mostly the same as the data-model/JSON-representation used by Needle
> (www.needlebase.com), the graph-database project for which I'm the designer.
> 
> glenn
> 

Received on Tuesday, 17 May 2011 18:08:40 UTC