RDF/RDFa API: Structuring graph data

We've done some research on JSON-LD during the past few weeks (complete
with a working implementation of a solution for the problem described
below). This stuff could have an effect on the RDF/RDFa API and how we
choose to expose data via the APIs (for Projections and PropertyGroups).
The link to the post can be found here:

http://groups.google.com/group/json-ld/browse_thread/thread/7f9c138ab6aa07be

Full text below:

-------------------------------------------------------------------------

We've been doing a bit of research at Digital Bazaar on how to best meld
graph-based object models with what most developers are familiar with
these days - JSON-based object programming (aka: associative-array based
object models). We want to enable developers to use the same data models
that they use in JavaScript today, but to work with arbitrary graph data.

This is an issue that we think is at the heart of why RDF has not caught
on as a general data model - the data is very difficult to work with in
programming languages. There is no native data structure that is easy to
work with without a complex set of APIs.

When a JavaScript author gets JSON-LD from a remote source, the graph
that the JSON-LD expresses can take a number of different but valid
forms. That is, the information expressed by the graph can be identical,
but each graph can be structured differently.

Think of these two statements:

The Q library contains book X.
Book X is contained in the Q library.

The information that is expressed in both sentences is exactly the same,
but the structure of each sentence is different. Structure is very
important when programming. When you write code, you expect the
structure of your data to not change.

However, when we program using graphs, the structure is almost always
unknown, so a mechanism to impose a structure is required in order to
help the programmer be more productive.

The way the graph is represented is entirely dependent on the algorithm
used to normalize and the algorithm used to break cycles in the graph.
Consider the following example, which is a graph with three top-level
objects - a library, a book and a chapter. Each of the items is related
to one another, thus the graph can be expressed in JSON-LD in a number
of different ways:

{
   "#":
   {
      "dc": "http://purl.org/dc/elements/1.1/",
      "ex": "http://example.org/vocab#"
   },
   "@":
   [
      {
         "@": "http://example.org/test#library",
         "a": "ex:Library",
         "ex:contains":  "<http://example.org/test#book>"
      },
      {
         "@": "<http://example.org/test#book>",
         "a": "ex:Book",
         "dc:contributor": "Writer",
         "dc:title": "My Book",
         "ex:contains": "<http://example.org/test#chapter>"
      },
      {
         "@": "http://example.org/test#chapter",
         "a": "ex:Chapter",
         "dc:description": "Fun",
         "dc:title": "Chapter One"
      }
   ]
}

The JSON-LD graph above could also be represented like so:

{
   "#":
   {
      "dc": "http://purl.org/dc/elements/1.1/",
      "ex": "http://example.org/vocab#"
   },
   "@": "http://example.org/test#library",
   "a": "ex:Library",
   "ex:contains":
   {
      "@": "<http://example.org/test#book>",
      "a": "ex:Book",
      "dc:contributor": "Writer",
      "dc:title": "My Book",
      "ex:contains":
      {
         "@": "http://example.org/test#chapter",
         "a": "ex:Chapter",
         "dc:description": "Fun",
         "dc:title": "Chapter One"
      }
   }
}

Both of the examples above express the exact same information, but the
graph structure is very different. If a developer can receive both of
the objects from a remote source, how do they ensure that they only have
to write one code path to deal with both examples?

That is, how can a developer reliably write the following code:

// print all of the books and their corresponding chapters
var library = jsonld.toObject(jsonLdText);
for(var bookIndex = 0; bookIndex < library["ex:contains"].length;
    bookIndex++)
{
   var book = library["ex:contains"][bookIndex];
   var bookTitle = book["dc:title"];
   for(var chapterIndex = 0; chapterIndex < book["ex:contains"].length;
       chapterIndex++)
   {
      var chapter = book["ex:contains"][chapterIndex];
      var chapterTitle = chapter["dc:title"];
      console.log("Book: " + bookTitle + " Chapter: " + chapterTitle);
   }
}

The answer boils down to ensuring that the data structure that is built
for the developer from the JSON-LD is framed in a way that makes
property access predictable. That is, the developer provides a structure
that MUST be filled out by the JSON-LD API. The working title for this
mechanism is called "Cycle Breaking and Object Framing" since both
mechanisms must be operable in order to solve this problem.

The developer would specify a Frame for their language-native object
like the following:

{
   "#": {"ex": "http://example.org/vocab#"},
   "a": "ex:Library",
   "ex:contains":
   {
      "a": "ex:Book",
      "ex:contains":
      {
         "a": "ex:Chapter"
      }
   }
}

The object frame above asserts that the developer expects to get a
library containing one or more books containing one or more chapters
returned to them. This ensures that the data is structured in a way that
is predictable and only one code path is necessary to work with graphs
that can take multiple forms. The API call that they would use would
look something like this:

var library = jsonld.toObject(jsonLdText, objectFrame);

The mechanism in the API and the algorithm that is used to perform cycle
breaking and object framing should be formalized in the JSON-LD
specification.

-------------------------------------------------------------------------

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Linked Data in JSON
http://digitalbazaar.com/2010/10/30/json-ld/

Received on Saturday, 22 January 2011 23:14:33 UTC