Re: Creating JSON from RDF from Dave Reynolds on 2009-12-14 (public-lod@w3.org from December 2009)

From: Dave Reynolds <dave.e.reynolds@googlemail.com>
Date: Mon, 14 Dec 2009 18:28:33 +0000
To: Mark Birbeck <mark.birbeck@webbackplane.com>
CC: Jeni Tennison <jeni@jenitennison.com>, public-lod@w3.org, John Sheridan <John.Sheridan@nationalarchives.gsi.gov.uk>
Message-ID: <4B2683D1.7020106@gmail.com>
Mark Birbeck wrote:

> On Sat, Dec 12, 2009 at 9:42 PM, Jeni Tennison <jeni@jenitennison.com> wrote:

>> One thing that we want to do is provide JSON representations of both RDF
>> graphs and SPARQL results. I wanted to run some ideas past this group as to
>> how we might do that.
> 
> Great again. :)
> 
> In the work I've been doing, I've concluded that in JSON-world, an RDF
> graph should be a JSON object (as explained in RDFj [1], and as you
> seem to concur), but also that SPARQL queries should return RDFj
> objects too.
> 
> In other words, after a lot of playing around I concluded that there
> was nothing to be gained from differentiating between representations
> of graphs, and the results of queries.

OK but the relationship between the JSON objects you get back from 
queries and the source RDF graph may be indirect. In a SPARQL query you 
sometimes pull pieces out of various depths of connected objects but 
from the JSON consumer you probably want to think of the results as an 
array of simple structures.

>> Note that the "$" is taken from RDFj. I'm not convinced it's a good idea to
>> use this symbol, rather than simply a property called "about" or "this" --
>> any opinions?
> 
> I agree, and in my RDFj description I do say that since '$' is used in
> a lot of Ajax libraries, I should find something else.
> 
> However, in my view, the 'something else' shouldn't look like a
> predicate, so I don't think 'about' or 'this' (or 'id' as someone
> suggests later in the thread), should be used. (Note also that 'id' is
> used in a related but slightly different way by Dojo.)
> 
> Also, the underscore is generally related to bnodes, so it might be
> confusing on quick reads through. (We have a JSON audience and an RDF
> audience, and need to make design decisions with both in mind.)
> 
> I've often thought about the empty string, '@' and other
> possibilities, but haven't had a chance to try them out. E.g., the
> empty string would simply look like this:
> 
>   {
>     "": "http://www.w3.org/TR/rdf-syntax-grammar",
>       "title": "RDF/XML Syntax Specification (Revised)",
>       "editor": {
>         "name": "Dave Beckett",
>         "homepage": "http://purl.org/net/dajobe/"
>       }
>   }
> 
> Since I always tend to indent the predicates in RDFj anyway, just to
> draw attention to them, then the empty string is reasonably visible.
> However, "@" would be even more obvious:
> 
>   {
>     "@": "http://www.w3.org/TR/rdf-syntax-grammar",
>       "title": "RDF/XML Syntax Specification (Revised)",
>       "editor": {
>         "name": "Dave Beckett",
>         "homepage": "http://purl.org/net/dajobe/"
>       }
>   }
> 
> Anyway, it shouldn't be that difficult to come up with something.

Naming discussions are often the hardest and can be non-terminating :) 
there is never only one acceptable answer, rarely a really good answer 
and everyone has opinions. Syntax and semantics are much easier than names.

As I've said, I like "id" as in freebase, I'd go along with "_about" 
(thought I take your point about bNode confusion) and I'd go along with 
"@" (but it makes me think of pointers rather than ids).

>> So, the first piece of configuration that I think we need here is to map
>> properties on to short names...
> 
> That's what 'tokens' in the 'context' object do, in RDFj.

I think there's two separate things here. How the producer of the JSON 
maps their RDF to JSON names and then whether the consumer is able to 
inspect that mapping.

For the first of those, I think the proposal is that there be a set of 
default conventions (e.g. as in Exhibit JSON) plus an optional mapping 
spec which enables the names to be improved.

For the second part, I agree with you than a context object in the 
delivered JSON would be useful so that those consumers who care can see 
the mapping what was applied and even invert it.

>> However, in any particular graph, there may be properties that have been
>> given the same JSON name (or, even more probably, local name). We could
>> provide multiple alternative names that could be chosen between, but any
>> mapping to JSON is going to need to give consistent results across a given
>> dataset for people to rely on it as an API, and that means the mapping can't
>> be based on what's present in the data. We could do something with prefixes,
>> but I have a strong aversion to assuming global prefixes.
> 
> I'm not sure here whether the goal is to map /any/ API to RDF, but if
> it is I think that's a separate problem to the 'JSON as RDF' question.

I think the problem is getting dev-friendly 'JSON out of RDF' rather 
than 'JSON as RDF', being able to invert that get back the RDF would be 
an added bonus rather than design requirement.

>> ## Multi-valued Properties ##
>>
>> First one first. It seems obvious that if you have a property with multiple
>> values, it should turn into a JSON array structure. For example:
>>
>>  [] foaf:name "Anna Wilder" ;
>>    foaf:nick "wilding", "wilda" ;
>>    foaf:homepage <http://example.org/about> .
>>
>> should become something like:
>>
>>  {
>>    "name": "Anna Wilder",
>>    "nick": [ "wilding", "wilda" ],
>>    "homepage": "http://example.org/about"
>>  }
>>
> 
> Right. For those who haven't read the RDFj proposal [1], this example
> is taken from there (although in my version I have angle brackets on
> the resource -- see above).
> 
> 
>> The trouble is that if you determine whether something is an array or not
>> based on the data that is actually available, you'll get situations where
>> the value of a particular JSON property is sometimes an array and sometimes
>> a string; that's bad for predictability for the people using the API.
>> (RDF/JSON solves this by every value being an array, but that's
>> counter-intuitive for normal developers.)
> 
> I'm not quite fully understanding the problem is here...sorry about
> that. The difficulty I have is that I can read what you're saying in
> two ways.
> 
> One interpretion is that, given the following JavaScript:
> 
>   {
>     "name": [ "Ivan Herman", "Herman Iván@hu" ]
>   }
> 
> there is no way to tell whether the RDF representation should be two
> triples, where each object is a literal (N3):
> 
>   [
>     foaf:name "Ivan Herman", "Herman Iván@hu"
>   ] .
> 
> or one triple where the object is a JSON array (N3 again):
> 
>   [
>     foaf:name "'Ivan Herman', 'Herman Iván@hu'"^^json:arrary
>   ] .

> I don't /think/ this is what you are saying, but if it is, I think the
> first case is easily the most useful, and so we should just assume
> that all arrays represent multiple predicates (i.e., it's like the
> comma in N3).

Yes. The potential confusion between JSON rendering of RDF lists (aka 
collections) and multivalued predicates was my point and I don't think 
it's what Jeni was getting at. We seem to have three votes now for going 
for the common case; and some support for including separate information 
in the context object to allow mapping back to lists when anyone cares.

> The second possible interpretation is that, when working with a JSON
> object, developers need to know when a property can hold an array, and
> when it will be a single value.
> 
> If that's what you mean, then I'll flag up the approach I've taken in
> my RDFj processor, which is to *always* test for an array, and then
> operate on a single value; if something is an array, generate a list
> of triples, and if not, take only one item.
> 
> For programmers working with the JSON object it's much the same; if we
> simply say that every value can have either a single value or an
> array, then it's pretty straightforward for them to then deal with
> that. This then gives us great flexibility, since everything can have
> multiple values as appropriate.

Yes. I agree that developers can cope with this inconsistency when they 
need to. So long as providers have enough control then can force 
consistency where they need to make the consumers life as easy as it 
reasonably can be.

[Lang tags]
> First, I think over the longer term we could actually get JS authors
> to accept extra data being added to strings, like this:
> 
>   {
>     "$": "http://statistics.data.gov.uk/id/local-authority-district/00PB",
>       "name": [
>         "The County Borough of Bridgend",
>         "Pen-y-bont ar Ogwr@cy"
>       ]
>   }
> 
> You might respond that this is also RDF-like, but I think it's a
> different degree. I think there's a great deal of value in JavaScript
> in being able to indicate what language something is, independent of
> RDFj, or other solutions.
> 
> But also, as described earlier, at a programmatic level, we could
> provide developers with extra properties, like this:
> 
>   var s = "Pen-y-bont ar Ogwr@cy";
> 
>   assert(s === "Pen-y-bont ar Ogwr@cy");
>   assert(s.value === "Pen-y-bont ar Ogwr");
>   assert(s.lang === "cy");

This seems like a good way to go. From a specs point of view I'd 
position it that the strings correspond to the lexical form for 
rdf:plainLiterals [1] except that a string with no '@' is legal (and so 
JSON "foo" would correspond to the rdf:plainLiteral "foo@").

> But also -- and this may be a key difference in our view on a possible
> architecture -- I place all RDFj /received/ into a triple store,
> alongside any other triples, including RDFa-generated ones, and then
> the author retrieves RDFj from this triple store, and consequently can
> structure the data in whatever way they prefer.

Ah, I think this may be a fundamental difference in requirements. I see 
this discussion as being about reaching out to developers who don't yet 
buy the RDF data model but want to use the data we are publishing. Those 
people definitely won't be putting the JSON they receive into a triple 
store.  We want them to be able to just take the JSON structure and work 
with it directly and not have to worry about how that JSON corresponds 
to triples.

> I really agree with you, Jeni, and I really think this whole space is
> incredibly important for semweb applications.
> 
> My feeling is that RDFj is largely there, but that the place where
> most of the issues you raise should be resolved is in a 'JSON query'
> layer.
> 
> I've been working on something I've called jSPARQL, which uses JSON
> objects to express queries, but it needs quite a bit more work to get
> to something that would feel 'comfortable' to a JavaScript programmer
> -- perhaps you'd be interested in helping to get that into shape?

Sounds interesting, and definitely worth looking at Freebase' MQL for 
inspiration. However, this seems like a separable issue. We can build 
useful REST APIs for getting at slices of the gov data without having to 
devise new query languages - the delivered format seems like the more 
urgent opportunity.

Cheers,
Dave

[1] http://www.w3.org/TR/rdf-text/
Received on Monday, 14 December 2009 18:29:11 UTC