Creating JSON from RDF

Hi,

As part of the linked data work the UK government is doing, we're  
looking at how to use the linked data that we have as the basis of  
APIs that are readily usable by developers who really don't want to  
learn about RDF or SPARQL.

One thing that we want to do is provide JSON representations of both  
RDF graphs and SPARQL results. I wanted to run some ideas past this  
group as to how we might do that.

To put this in context, what I think we should aim for is a pure  
publishing format that is optimised for approachability for normal  
developers, *not* an interchange format. RDF/JSON [1] and the SPARQL  
results JSON format [2] aren't entirely satisfactory as far as I'm  
concerned because of the way the objects of statements are represented  
as JSON objects rather than as simple values. I still think we should  
produce them (to wean people on to, and for those using more generic  
tools), but I'd like to think about producing something that is a bit  
more immediately approachable too.

RDFj [3] is closer to what I think is needed here. However, I don't  
think there's a need for setting 'context' given I'm not aiming for an  
interchange format, there are no clear rules about how to generate it  
from an arbitrary graph (basically there can't be without some  
additional configuration) and it's not clear how to deal with  
datatypes or languages.

I suppose my first question is whether there are any other JSON-based  
formats that we should be aware of, that we could use or borrow ideas  
from?

Assuming there aren't, I wanted to discuss what generic rules we might  
use, where configuration is necessary and how the configuration might  
be done.

# RDF Graphs #

Let's take as an example:

   <http://www.w3.org/TR/rdf-syntax-grammar>
     dc:title "RDF/XML Syntax Specification (Revised)" ;
     ex:editor [
       ex:fullName "Dave Beckett" ;
       ex:homePage <http://purl.org/net/dajobe/> ;
     ] .

In JSON, I think we'd like to create something like:

   {
     "$": "http://www.w3.org/TR/rdf-syntax-grammar",
     "title": "RDF/XML Syntax Specification (Revised)",
     "editor": {
       "name": "Dave Beckett",
       "homepage": "http://purl.org/net/dajobe/"
     }
   }

Note that the "$" is taken from RDFj. I'm not convinced it's a good  
idea to use this symbol, rather than simply a property called "about"  
or "this" -- any opinions?

Also note that I've made no distinction in the above between a URI and  
a literal, while RDFj uses <>s around literals. My feeling is that  
normal developers really don't care about the distinction between a  
URI literal and a pointer to a resource, and that they will base the  
treatment of the value of a property on the (name of) the property  
itself.

So, the first piece of configuration that I think we need here is to  
map properties on to short names that make good JSON identifiers (ie  
name tokens without hyphens). Given that properties normally have  
lowercaseCamelCase local names, it should be possible to use that as a  
default. If you need something more readable, though, it seems like it  
should be possible to use a property of the property, such as:

   ex:fullName api:jsonName "name" .
   ex:homePage api:jsonName "homepage" .

However, in any particular graph, there may be properties that have  
been given the same JSON name (or, even more probably, local name). We  
could provide multiple alternative names that could be chosen between,  
but any mapping to JSON is going to need to give consistent results  
across a given dataset for people to rely on it as an API, and that  
means the mapping can't be based on what's present in the data. We  
could do something with prefixes, but I have a strong aversion to  
assuming global prefixes.

So I think this means that we need to provide configuration at an API  
level rather than at a global level: something that can be used  
consistently across a particular API to determine the token that's  
used for a given property. For example:

   <> a api:JSON ;
     api:mapping [
       api:property ex:fullName ;
       api:name "name" ;
     ] , [
       api:property ex:homePage ;
       api:name "homepage" ;
     ] .

There are four more areas where I think there's configuration we need  
to think about:

   * multi-valued properties
   * typed and language-specific values
   * nesting objects
   * suppressing properties

## Multi-valued Properties ##

First one first. It seems obvious that if you have a property with  
multiple values, it should turn into a JSON array structure. For  
example:

   [] foaf:name "Anna Wilder" ;
     foaf:nick "wilding", "wilda" ;
     foaf:homepage <http://example.org/about> .

should become something like:

   {
     "name": "Anna Wilder",
     "nick": [ "wilding", "wilda" ],
     "homepage": "http://example.org/about"
   }

The trouble is that if you determine whether something is an array or  
not based on the data that is actually available, you'll get  
situations where the value of a particular JSON property is sometimes  
an array and sometimes a string; that's bad for predictability for the  
people using the API. (RDF/JSON solves this by every value being an  
array, but that's counter-intuitive for normal developers.)

So I think a second API-level configuration that needs to be made is  
to indicate which properties should be arrays and which not:

   <> a api:API ;
     api:mapping [
       api:property foaf:nick ;
       api:name "nick" ;
       api:array true ;
     ] .

## Typed Values and Languages ##

Typed values and values with languages are really the same problem. If  
we have something like:

   <http://statistics.data.gov.uk/id/local-authority-district/00PB>
     skos:prefLabel "The County Borough of Bridgend"@en ;
     skos:prefLabel "Pen-y-bont ar Ogwr"@cy ;
     skos:notation "00PB"^^geo:StandardCode ;
     skos:notation "6405"^^transport:LocalAuthorityCode .

then we'd really want the JSON to look something like:

   {
     "$": "http://statistics.data.gov.uk/id/local-authority-district/00PB 
",
     "name": "The County Borough of Bridgend",
     "welshName": "Pen-y-bont ar Ogwr",
     "onsCode": "00PB",
     "dftCode": "6405"
   }

I think that for this to work, the configuration needs to be able to  
filter values based on language or datatype to determine the JSON  
property name. Something like:

   <> a api:JSON ;
     api:mapping [
       api:property skos:prefLabel ;
       api:lang "en" ;
       api:name "name" ;
     ] , [
       api:property skos:prefLabel ;
       api:lang "cy" ;
       api:name "welshName" ;
     ] , [
       api:property skos:notation ;
       api:datatype geo:StandardCode ;
       api:name "onsCode" ;
     ] , [
       api:property skos:notation ;
       api:datatype transport:LocalAuthorityCode ;
       api:name "dftCode" ;
     ] .

## Nesting Objects ##

Regarding nested objects, I'm again inclined to view this as a  
configuration option rather than something that is based on the  
available data. For example, if we have:

   <http://example.org/about>
     dc:title "Anna's Homepage"@en ;
     foaf:maker <http://example.org/anna> .

   <http://example.org/anna>
     foaf:name "Anna Wilder" ;
     foaf:homepage <http://example.org/about> .

this could be expressed in JSON as either:

   {
     "$": "http://example.org/about",
     "title": "Anna's Homepage",
     "maker": {
       "$": "http://example.org/anna",
       "name": "Anna Wilder",
       "homepage": "http://example.org/about"
     }
   }

or:

   {
     "$": "http://example.org/anna",
     "name": "Anna Wilder",
     "homepage": {
       "$": "http://example.org/about",
       "title": "Anna's Homepage",
       "maker": "http://example.org/anna"
     }
   }

The one that's required could be indicated through the configuration,  
for example:

   <> a api:API ;
     api:mapping [
       api:property foaf:maker ;
       api:name "maker" ;
       api:embed true ;
     ] .

The final thought that I had for representing RDF graphs as JSON was  
about suppressing properties. Basically I'm thinking that this  
configuration should work on any graph, most likely one generated from  
a DESCRIBE query. That being the case, it's likely that there will be  
properties that repeat information (because, for example, they are a  
super-property of another property). It will make a cleaner JSON API  
if those repeated properties aren't included. So something like:

   <> a api:API ;
     api:mapping [
       api:property admingeo:contains ;
       api:ignore true ;
     ] .

# SPARQL Results #

I'm inclined to think that creating JSON representations of SPARQL  
results that are acceptable to normal developers is less important  
than creating JSON representations of RDF graphs, for two reasons:

   1. SPARQL naturally gives short, usable, names to the properties in  
JSON objects
   2. You have to be using SPARQL to create them anyway, and if you're  
doing that then you can probably grok the extra complexity of having  
values that are objects

Nevertheless, there are two things that could be done to simplify the  
SPARQL results format for normal developers.

One would be to just return an array of the results, rather than an  
object that contains a results property that contains an object with a  
bindings property that contains an array of the results. People who  
want metadata can always request the standard SPARQL results JSON  
format.

The second would be to always return simple values rather than  
objects. For example, rather than:

   {
     "head": {
       "vars": [ "book", "title" ]
     },
     "results": {
       "bindings": [
         {
           "book": {
             "type": "uri",
             "value": "http://example.org/book/book6"
           },
           "title": {
             "type": "literal",
             "value", "Harry Potter and the Half-Blood Prince"
           }
         },
         {
           "book": {
             "type": "uri",
             "value": "http://example.org/book/book5"
           },
           "title": {
             "type": "literal",
             "value": "Harry Potter and the Order of the Phoenix"
           }
         },
         ...
       ]
     }
   }

a normal developer would want to just get:

   [{
     "book": "http://example.org/book/book6",
     "title": "Harry Potter and the Half-Blood Prince"
    },{
      "book": "http://example.org/book/book5",
      "title": "Harry Potter and the Order of the Phoenix"
    },
    ...
   ]
I don't think we can do any configuration here. It means that  
information about datatypes and languages isn't visible, but (a) I'm  
pretty sure that 80% of the time that doesn't matter, (b) there's  
always the full JSON version if people need it and (c) they could  
write SPARQL queries that used the datatype/language to populate  
different variables/properties if they wanted to.

So there you are. I'd really welcome any thoughts or pointers about  
any of this: things I've missed, vocabularies we could reuse, things  
that you've already done along these lines, and so on. Reasons why  
none of this is necessary are fine too, but I'll warn you in advance  
that I'm unlikely to be convinced ;)
Thanks,
Jeni

[1]: http://n2.talis.com/wiki/RDF_JSON_Specification
[2]: http://www.w3.org/TR/rdf-sparql-json-res/
[3]: http://code.google.com/p/ubiquity-rdfa/wiki/Rdfj
-- 
Jeni Tennison
http://www.jenitennison.com

Received on Saturday, 12 December 2009 21:43:21 UTC