- From: Mark Birbeck <mark.birbeck@webbackplane.com>
- Date: Sun, 13 Dec 2009 21:14:07 +0000
- To: Jeni Tennison <jeni@jenitennison.com>
- Cc: public-lod@w3.org, John Sheridan <John.Sheridan@nationalarchives.gsi.gov.uk>
Hi Jeni, On Sat, Dec 12, 2009 at 9:42 PM, Jeni Tennison <jeni@jenitennison.com> wrote: > Hi, > > As part of the linked data work the UK government is doing, we're looking at > how to use the linked data that we have as the basis of APIs that are > readily usable by developers who really don't want to learn about RDF or > SPARQL. Great. > One thing that we want to do is provide JSON representations of both RDF > graphs and SPARQL results. I wanted to run some ideas past this group as to > how we might do that. Great again. :) In the work I've been doing, I've concluded that in JSON-world, an RDF graph should be a JSON object (as explained in RDFj [1], and as you seem to concur), but also that SPARQL queries should return RDFj objects too. In other words, after a lot of playing around I concluded that there was nothing to be gained from differentiating between representations of graphs, and the results of queries. > To put this in context, what I think we should aim for is a pure publishing > format that is optimised for approachability for normal developers, *not* an > interchange format. RDF/JSON and the SPARQL results JSON format > aren't entirely satisfactory as far as I'm concerned because of the way the > objects of statements are represented as JSON objects rather than as simple > values. I still think we should produce them (to wean people on to, and for > those using more generic tools), but I'd like to think about producing > something that is a bit more immediately approachable too. +72. :) I would also put irJSON into this category, which I see is referred to in a later post. > RDFj is closer to what I think is needed here. Good. > However, I don't think > there's a need for setting 'context' given I'm not aiming for an interchange > format, there are no clear rules about how to generate it from an arbitrary > graph (basically there can't be without some additional configuration) and > it's not clear how to deal with datatypes or languages. I probably didn't explain the use of 'context' well enough, but since I think you do need it, I'll explain it more, below. > I suppose my first question is whether there are any other JSON-based > formats that we should be aware of, that we could use or borrow ideas from? I also did a thorough search, before devising RDFj. In general I found many 'interchange formats', as you call them, but I didn't find anything that came from the other direction, saying 'how should we interpret JavaScript data as RDF'. I think this approach is what makes RDFj different, because it is trying as much as possible to leave the JS alone, and provide a layer of /interpretation/. It's exactly how I approached RDFa -- I began with HTML mark-up, such as <link>, <a> and <meta>, and asked myself what an RDF interpretation of each 'pattern' would be. (Which incidentally shows that there is plenty more work that could be done on this, in RDFa; what is an RDF interpretation of @cite, <blockquote>, and <img>, for example?) > Assuming there aren't, I wanted to discuss what generic rules we might use, > where configuration is necessary and how the configuration might be done. Excellent. It may sound sacrilegious, but I happen to think that in lots of ways RDFj is more important than RDFa. Consequently, I was beginning to worry that everyone was quite happy with the 'interchange formats', and didn't see the point of discussing a more 'natural' JSON approach! > # RDF Graphs # > > Let's take as an example: > > <http://www.w3.org/TR/rdf-syntax-grammar> > dc:title "RDF/XML Syntax Specification (Revised)" ; > ex:editor [ > ex:fullName "Dave Beckett" ; > ex:homePage <http://purl.org/net/dajobe/> ; > ] . > > In JSON, I think we'd like to create something like: > > { > "$": "http://www.w3.org/TR/rdf-syntax-grammar", > "title": "RDF/XML Syntax Specification (Revised)", > "editor": { > "name": "Dave Beckett", > "homepage": "http://purl.org/net/dajobe/" > } > } Definitely. Key things are that -- other than the subject -- it's very familiar to JS programmers. In particular, there's no verbose use of 'name' and 'value' properties to demarcate the predicates and objects, in the way that 'interchange formats' do. Also, there are no explicit bnodes to indicate that one statement's subject is another's object -- the natural flow of JavaScript is used. These were key design goals for RDFj. > Note that the "$" is taken from RDFj. I'm not convinced it's a good idea to > use this symbol, rather than simply a property called "about" or "this" -- > any opinions? I agree, and in my RDFj description I do say that since '$' is used in a lot of Ajax libraries, I should find something else. However, in my view, the 'something else' shouldn't look like a predicate, so I don't think 'about' or 'this' (or 'id' as someone suggests later in the thread), should be used. (Note also that 'id' is used in a related but slightly different way by Dojo.) Also, the underscore is generally related to bnodes, so it might be confusing on quick reads through. (We have a JSON audience and an RDF audience, and need to make design decisions with both in mind.) I've often thought about the empty string, '@' and other possibilities, but haven't had a chance to try them out. E.g., the empty string would simply look like this: { "": "http://www.w3.org/TR/rdf-syntax-grammar", "title": "RDF/XML Syntax Specification (Revised)", "editor": { "name": "Dave Beckett", "homepage": "http://purl.org/net/dajobe/" } } Since I always tend to indent the predicates in RDFj anyway, just to draw attention to them, then the empty string is reasonably visible. However, "@" would be even more obvious: { "@": "http://www.w3.org/TR/rdf-syntax-grammar", "title": "RDF/XML Syntax Specification (Revised)", "editor": { "name": "Dave Beckett", "homepage": "http://purl.org/net/dajobe/" } } Anyway, it shouldn't be that difficult to come up with something. > Also note that I've made no distinction in the above between a URI and a > literal, while RDFj uses <>s around literals. My feeling is that normal > developers really don't care about the distinction between a URI literal and > a pointer to a resource, and that they will base the treatment of the value > of a property on the (name of) the property itself. That's true, but I think we gain a lot by making the distinction. I'd also suggest that for JS authors it's not a difficult thing to grasp. Also, it's not just URIs that would use a richer syntax; although it hasn't yet been implemented in my parser, my plan for RDFj has always been to use N3-like notation inside the attributes, such as for languages: { "name": [ "Ivan Herman", "Herman Iván@hu" ] } My thinking was also that an RDFj 'processor' would tweak the objects, to make some of this RDF metadata available to programmers. For example: var foo = RDFJ.import({ "name": [ "Ivan Herman", "Herman Iván@hu" ] }); assert(foo.name[0] === "Ivan Herman"); assert(foo.name[1] === "Herman Iván@hu"); assert(foo.name[0].value === "Ivan Herman"); assert(foo.name[1].value === "Herman Iván"); assert(foo.name[1].lang === "hu"); The same principle would apply to other data types and URIs. (Note that there is nothing to stop you leaving off the angle brackets, if you know that you'll manage all of the processing yourself. The point is that RDFj intends to provide *both* a 'JSON as RDF' technique, *and* an 'RDF as JSON' technique.) > So, the first piece of configuration that I think we need here is to map > properties on to short names... That's what 'tokens' in the 'context' object do, in RDFj. > ... that make good JSON identifiers (ie name tokens > without hyphens). Given that properties normally have > lowercaseCamelCase local names, it should be possible > to use that as a default. I don't follow why you have this requirement (no hyphens) -- where does it come from? Anyway, in RDFj you don't need to abbreviate the predicates: { "http://xmlns.com/foaf/0.1/name": "Dave Beckett" "http://xmlns.com/foaf/0.1/homepage": "<http://purl.org/net/dajobe/>" } But of course, you can abbreviate them if you want to. > If you need > something more readable, though, it seems like it should be possible to use > a property of the property, such as: > > ex:fullName api:jsonName "name" . > ex:homePage api:jsonName "homepage" . The simplest technique to provide token mappings is to take the mappings out of the graph. That's what RDF/XML does by using namespace prefixes, N3 does by using @prefix, and RDFj does by using the 'context' object: { context: { token: { "http://xmlns.com/foaf/0.1/homepage": "homepage", "http://xmlns.com/foaf/0.1/name": "name" } }, "name": "Dave Beckett", "homepage": "<http://purl.org/net/dajobe/>" } Of course, the token being mapped can be anything, and what's quite handy about this, is that JSON objects over which we have no control could still be converted to RDF, by simply adding a context object. For example, if some service returned: { "fullName": "Dave Beckett", "url": "<http://purl.org/net/dajobe/>" } to convert this to RDF we assign it to a variable, and add a context: var foo = goGetSomeData( url ); foo.context = { token: { "http://xmlns.com/foaf/0.1/homepage": "url", "http://xmlns.com/foaf/0.1/name": "fullName" } }; The foo object can now be interpreted as RDF, via RDFj. > However, in any particular graph, there may be properties that have been > given the same JSON name (or, even more probably, local name). We could > provide multiple alternative names that could be chosen between, but any > mapping to JSON is going to need to give consistent results across a given > dataset for people to rely on it as an API, and that means the mapping can't > be based on what's present in the data. We could do something with prefixes, > but I have a strong aversion to assuming global prefixes. I'm not sure here whether the goal is to map /any/ API to RDF, but if it is I think that's a separate problem to the 'JSON as RDF' question. In passing, my approach to converting feeds -- for example a Twitter feed -- into RDF is to make use of named graph support in SPARQL queries, and then provide a few triples that describe how a URI that appears in a SPARQL query -- as a named graph URI -- should be processed to obtain triples. I call these 'named graph mappers' [2]. There's a lot more that can be done in this area, but the key thing is that much of the information that you are referring to, that guides the processing, should in my view be at the query level. > So I think this means that we need to provide configuration at an API level > rather than at a global level: something that can be used consistently > across a particular API to determine the token that's used for a given > property. For example: > > <> a api:JSON ; > api:mapping [ > api:property ex:fullName ; > api:name "name" ; > ] , [ > api:property ex:homePage ; > api:name "homepage" ; > ] . The advantage of the RDFj solution (using context.tokens) is that the mappings travel with the data, i.e., it is independent of any API. > There are four more areas where I think there's configuration we need to > think about: > > * multi-valued properties > * typed and language-specific values > * nesting objects > * suppressing properties > > ## Multi-valued Properties ## > > First one first. It seems obvious that if you have a property with multiple > values, it should turn into a JSON array structure. For example: > > [] foaf:name "Anna Wilder" ; > foaf:nick "wilding", "wilda" ; > foaf:homepage <http://example.org/about> . > > should become something like: > > { > "name": "Anna Wilder", > "nick": [ "wilding", "wilda" ], > "homepage": "http://example.org/about" > } > Right. For those who haven't read the RDFj proposal [1], this example is taken from there (although in my version I have angle brackets on the resource -- see above). > The trouble is that if you determine whether something is an array or not > based on the data that is actually available, you'll get situations where > the value of a particular JSON property is sometimes an array and sometimes > a string; that's bad for predictability for the people using the API. > (RDF/JSON solves this by every value being an array, but that's > counter-intuitive for normal developers.) I'm not quite fully understanding the problem is here...sorry about that. The difficulty I have is that I can read what you're saying in two ways. One interpretion is that, given the following JavaScript: { "name": [ "Ivan Herman", "Herman Iván@hu" ] } there is no way to tell whether the RDF representation should be two triples, where each object is a literal (N3): [ foaf:name "Ivan Herman", "Herman Iván@hu" ] . or one triple where the object is a JSON array (N3 again): [ foaf:name "'Ivan Herman', 'Herman Iván@hu'"^^json:arrary ] . I don't /think/ this is what you are saying, but if it is, I think the first case is easily the most useful, and so we should just assume that all arrays represent multiple predicates (i.e., it's like the comma in N3). The second possible interpretation is that, when working with a JSON object, developers need to know when a property can hold an array, and when it will be a single value. If that's what you mean, then I'll flag up the approach I've taken in my RDFj processor, which is to *always* test for an array, and then operate on a single value; if something is an array, generate a list of triples, and if not, take only one item. For programmers working with the JSON object it's much the same; if we simply say that every value can have either a single value or an array, then it's pretty straightforward for them to then deal with that. This then gives us great flexibility, since everything can have multiple values as appropriate. But I realise I could have misunderstood this particular point, so apologies if so. > So I think a second API-level configuration that needs to be made is to > indicate which properties should be arrays and which not: > > <> a api:API ; > api:mapping [ > api:property foaf:nick ; > api:name "nick" ; > api:array true ; > ] . I think this over-complicates things, and since most JS programmers can work it out themselves (by testing the type of the data), I'm not sure they will thank you for the extra information anyway. :) > ## Typed Values and Languages ## > > Typed values and values with languages are really the same problem. If we > have something like: > > <http://statistics.data.gov.uk/id/local-authority-district/00PB> > skos:prefLabel "The County Borough of Bridgend"@en ; > skos:prefLabel "Pen-y-bont ar Ogwr"@cy ; > skos:notation "00PB"^^geo:StandardCode ; > skos:notation "6405"^^transport:LocalAuthorityCode . > > then we'd really want the JSON to look something like: > > { > "$": "http://statistics.data.gov.uk/id/local-authority-district/00PB", > "name": "The County Borough of Bridgend", > "welshName": "Pen-y-bont ar Ogwr", > "onsCode": "00PB", > "dftCode": "6405" > } > > I think that for this to work, the configuration needs to be able to filter > values based on language or datatype to determine the JSON property name. > Something like: > > <> a api:JSON ; > api:mapping [ > api:property skos:prefLabel ; > api:lang "en" ; > api:name "name" ; > ] , [ > api:property skos:prefLabel ; > api:lang "cy" ; > api:name "welshName" ; > ] , [ > api:property skos:notation ; > api:datatype geo:StandardCode ; > api:name "onsCode" ; > ] , [ > api:property skos:notation ; > api:datatype transport:LocalAuthorityCode ; > api:name "dftCode" ; > ] . Of course there are many ways to skin a cat, so I don't want to rule this out of court. But to me it's just way too RDF-like. First, I think over the longer term we could actually get JS authors to accept extra data being added to strings, like this: { "$": "http://statistics.data.gov.uk/id/local-authority-district/00PB", "name": [ "The County Borough of Bridgend", "Pen-y-bont ar Ogwr@cy" ] } You might respond that this is also RDF-like, but I think it's a different degree. I think there's a great deal of value in JavaScript in being able to indicate what language something is, independent of RDFj, or other solutions. But also, as described earlier, at a programmatic level, we could provide developers with extra properties, like this: var s = "Pen-y-bont ar Ogwr@cy"; assert(s === "Pen-y-bont ar Ogwr@cy"); assert(s.value === "Pen-y-bont ar Ogwr"); assert(s.lang === "cy"); > ## Nesting Objects ## > > Regarding nested objects, I'm again inclined to view this as a configuration > option rather than something that is based on the available data. For > example, if we have: > > <http://example.org/about> > dc:title "Anna's Homepage"@en ; > foaf:maker <http://example.org/anna> . > > <http://example.org/anna> > foaf:name "Anna Wilder" ; > foaf:homepage <http://example.org/about> . > > this could be expressed in JSON as either: > > { > "$": "http://example.org/about", > "title": "Anna's Homepage", > "maker": { > "$": "http://example.org/anna", > "name": "Anna Wilder", > "homepage": "http://example.org/about" > } > } > > or: > > { > "$": "http://example.org/anna", > "name": "Anna Wilder", > "homepage": { > "$": "http://example.org/about", > "title": "Anna's Homepage", > "maker": "http://example.org/anna" > } > } > > The one that's required could be indicated through the configuration, for > example: > > <> a api:API ; > api:mapping [ > api:property foaf:maker ; > api:name "maker" ; > api:embed true ; > ] . I realise that the two serialisations of RDFj would not be the same, but I'm not seeing what difference that would make. Are you thinking that someone might write some code that relies on the structure of the object, and then gets thrown by a change in structure? I guess that's true, but in my work on RDFj, I had come to the conclusion that people would write processors that deal with little blocks of the data, and then call those as and when. So in your example, if we wrote a processor that handled the object attached to the predicate 'maker' and another processor for the predicate 'homepage', then it shouldn't really matter in which order the data appeared, the correct processors would just be called. But also -- and this may be a key difference in our view on a possible architecture -- I place all RDFj /received/ into a triple store, alongside any other triples, including RDFa-generated ones, and then the author retrieves RDFj from this triple store, and consequently can structure the data in whatever way they prefer. > The final thought that I had for representing RDF graphs as JSON was about > suppressing properties. Basically I'm thinking that this configuration > should work on any graph, most likely one generated from a DESCRIBE query. > That being the case, it's likely that there will be properties that repeat > information (because, for example, they are a super-property of another > property). It will make a cleaner JSON API if those repeated properties > aren't included. So something like: > > <> a api:API ; > api:mapping [ > api:property admingeo:contains ; > api:ignore true ; > ] . I think we need to be thinking about how to get closer to SPARQL here, though. (Actually this point is the same for a number of the other circumstances.) My fear here is that we're either duplicating the functionality that can be provided within a SPARQL query, or we're adding a layer above it, when actually the information should be expressed at the query layer. I'm not suggesting that JavaScript authors should have to get involved with SPARQL. But if we imagine SPARQL recast for JavaScript, then I think it's there that the kinds of things you want should be described, and not in the API. Perhaps we should look at the query side, and then move some of your constraints into that? > # SPARQL Results # > > I'm inclined to think that creating JSON representations of SPARQL results > that are acceptable to normal developers is less important than creating > JSON representations of RDF graphs, for two reasons: > > 1. SPARQL naturally gives short, usable, names to the properties in JSON > objects > 2. You have to be using SPARQL to create them anyway, and if you're doing > that then you can probably grok the extra complexity of having values that > are objects I think the two things are inseparable, but that's probably because, as I say, I put all data into a JavaScript triple store, and then query it with a SPARQL-ish JavaScript syntax. Currently I get back simple JSON objects that have properties that are the name of the values used in the query, but my plan is to converge the query results with my RDFj work, so that it's RDFj in, and RDFj out. I've said this before, I know, but what's great about this technique is that the query engine becomes a 'JSON-object creator', and I think this is a very powerful programming paradigm. > Nevertheless, there are two things that could be done to simplify the SPARQL > results format for normal developers. > > One would be to just return an array of the results, rather than an object > that contains a results property that contains an object with a bindings > property that contains an array of the results. People who want metadata can > always request the standard SPARQL results JSON format. > > The second would be to always return simple values rather than objects. For > example, rather than: > > { > "head": { > "vars": [ "book", "title" ] > }, > "results": { > "bindings": [ > { > "book": { > "type": "uri", > "value": "http://example.org/book/book6" > }, > "title": { > "type": "literal", > "value", "Harry Potter and the Half-Blood Prince" > } > }, > { > "book": { > "type": "uri", > "value": "http://example.org/book/book5" > }, > "title": { > "type": "literal", > "value": "Harry Potter and the Order of the Phoenix" > } > }, > ... > ] > } > } > > a normal developer would want to just get: > > [{ > "book": "http://example.org/book/book6", > "title": "Harry Potter and the Half-Blood Prince" > },{ > "book": "http://example.org/book/book5", > "title": "Harry Potter and the Order of the Phoenix" > }, > ... > ] Yes, that's what I do in my query engine. As I say, this makes querying a triple-store into a 'dynamic object creation' mechanism, and I'm convinced that JS programmers will grok this pretty easily. But also, since the input to the triple-store can be RDFj, and this output can also be RDFj, it makes it very easy to move data around. > I don't think we can do any configuration here. It means that information > about datatypes and languages isn't visible... I think it can be, using the techniques I've explained above. > ... but (a) I'm pretty sure that > 80% of the time that doesn't matter, (b) there's always the full JSON > version if people need it and (c) they could write SPARQL queries that used > the datatype/language to populate different variables/properties if they > wanted to. I think we should strive to preserve all of the information. > So there you are. I'd really welcome any thoughts or pointers about any of > this: things I've missed, vocabularies we could reuse, things that you've > already done along these lines, and so on. Reasons why none of this is > necessary are fine too, but I'll warn you in advance that I'm unlikely to be > convinced ;) +94! I really agree with you, Jeni, and I really think this whole space is incredibly important for semweb applications. My feeling is that RDFj is largely there, but that the place where most of the issues you raise should be resolved is in a 'JSON query' layer. I've been working on something I've called jSPARQL, which uses JSON objects to express queries, but it needs quite a bit more work to get to something that would feel 'comfortable' to a JavaScript programmer -- perhaps you'd be interested in helping to get that into shape? Regards, Mark [1] <http://code.google.com/p/backplanejs/wiki/Rdfj> [2] <http://code.google.com/p/backplanejs/wiki/CreateNamedGraphMapper> -- Mark Birbeck, webBackplane mark.birbeck@webBackplane.com http://webBackplane.com/mark-birbeck webBackplane is a trading name of Backplane Ltd. (company number 05972288, registered office: 2nd Floor, 69/85 Tabernacle Street, London, EC2A 4RR)
Received on Sunday, 13 December 2009 21:14:44 UTC