- From: glenn mcdonald <glenn@furia.com>
- Date: Tue, 17 May 2011 12:15:52 -0400
- To: Manu Sporny <msporny@digitalbazaar.com>, Linked JSON <public-linked-json@w3.org>
- Message-ID: <BANLkTik_JJX-NouP8B3h=becoRuLOoVgYg@mail.gmail.com>
It seems to me that what we ought to be starting with, here, is a clear, unencumbered, useful, standard way of representing graphs in JSON. Linking, merging, mapping, federation: all this stuff comes later, in layers, over a solid core graph-represenatation. First we need to do for graphs what CSV did for tables. We don't *have* to do it in JSON, but I don't see why we shouldn't. I also believe a few more things that may be non-obvious and/or controversial: - lists are a native logical data construct, and should be integral to a graph representation - graphs describe the relationships between concepts; literals describe the assignment of symbols to those concepts: these are two fundamentally different frames of reference, and shouldn't be intermingled - eliminating any uncertainty about the directionality of relationships, for the consumer of a graph, is worth imposing the assumption/burden of bi-directional relationship-maintainance on the underlying data system So here's the JSON approach this leads me to. A dataset is a bunch of data points, and each point is a JSON object like this: { "ID": 102, "Name": "Nightwish", "Arcs": { "Type": [5], "Album": [134,167,189,203,214], "Genre": [74], "MusicBrainz ID": [540] } } So: - each point has a numeric (relative) ID - each point has an optional Name literal (and might have other literals like a machine-readable Value, alternate languages, etc.) - each point has a set of arcs - each arc has a name and an ordered list of target-points, specified by (relative) ID - every data point has "Type" among its arcs - every conceptual entity is represented by a data point; note that this includes the MusicBrainz ID here: 540 is the ID of the local data-point representing that external ID, which might in turn look like this: { "ID": 540, "Name": "00a9f935-ba93-4fc8-a33a-993abe9c936b", "Arcs": { "Type": [46], "Artist": [102] } } Requiring this to be a point ensures that it has a type, and a local ID, and can thus be differentiated, both structurally and individually, from any other use of that same string. A simple dataset is then just an array of its points. That's it. Now we can share graphs. A more complex dataset might embed that array in a metadata object with some other context, like: - a base IRI, for turning these local IDs into IRIs - some mappings of these local arc-names (usually scoped by type) to IRIs, like Artist.Album to <http://musicgeek.com/ontology/1.0/release> but that's all external to the graph itself, and can be discarded or replaced by the consumer of the data (e.g., I want to map that "Album" arc to <http://musicnerd.com/datamodel/collection> instead). Notes and Disclaimers: - I take no offense if people are attached to precedents that this approach discards, and thus consider it unresponsive. - I'm wrting this in my personal capacity as an interested bystander. I'm not on any standards committees and am not expressing a corporate point of view. - But I'm the same person when I go to work, which in my case is at Google (via the recent acquisition of ITA Software), where the scheme I describe here is mostly the same as the data-model/JSON-representation used by Needle (www.needlebase.com), the graph-database project for which I'm the designer. glenn
Received on Tuesday, 17 May 2011 16:16:40 UTC