Re: Creating JSON from RDF

Richard,

My opinion, based on the reactions that I've seen from enthusiastic,  
hard-working developers who just want to get things done, is that we  
(the data.gov.uk project in particular, linked data in general) are  
not providing them what they need.

We can sit around and wait for other people to provide the simple,  
light-weight interfaces that those developers demand, or we can do it  
ourselves. I can predict with near certainty that if we do not do it  
ourselves, these developers will not use the linked data that we  
produce: they will download the original source data which is also  
being made available to them, and use that.

We, here, on this list, understand the potential power of using linked  
data. The developers who want to use the data don't. (And the  
publishers producing the data don't.) We simply can't say "but they  
can just build tools", "they can just use SPARQL". They are not going  
to build bridges to us. We have to build bridges to them.

My opinion.

Jeni

On 14 Dec 2009, at 09:23, Richard Light wrote:

> In message <C74BADC3.20683%t.hammond@nature.com>, "Hammond, Tony" <t.hammond@nature.com 
> > writes
>>
>> Normal developers will always want simple.
>
> Surely what normal developers actually want are simple commands  
> whereby data can be streamed in, and become available  
> programmatically within their chosen development environment,  
> without any further effort on their part?
>
> Personally I don't see how providing a format which is easier for  
> humans to read helps to achieve this.  Do normal developers like  
> writing text parsers so much?
>
> Give 'em RDF and tell them to develop better toolsets ...
>
> Come to that, RDF-to-JSON conversion could be a downstream service  
> that someone else offers.  You don't have to do it all.
>
> Richard
>
>> On 12/12/09 21:42, "Jeni Tennison" <jeni@jenitennison.com> wrote:
>>
>>> Hi,
>>>
>>> As part of the linked data work the UK government is doing, we're
>>> looking at how to use the linked data that we have as the basis of
>>> APIs that are readily usable by developers who really don't want to
>>> learn about RDF or SPARQL.
>>>
>>> One thing that we want to do is provide JSON representations of both
>>> RDF graphs and SPARQL results. I wanted to run some ideas past this
>>> group as to how we might do that.
>>>
>>> To put this in context, what I think we should aim for is a pure
>>> publishing format that is optimised for approachability for normal
>>> developers, *not* an interchange format. RDF/JSON [1] and the SPARQL
>>> results JSON format [2] aren't entirely satisfactory as far as I'm
>>> concerned because of the way the objects of statements are  
>>> represented
>>> as JSON objects rather than as simple values. I still think we  
>>> should
>>> produce them (to wean people on to, and for those using more generic
>>> tools), but I'd like to think about producing something that is a  
>>> bit
>>> more immediately approachable too.
>>>
>>> RDFj [3] is closer to what I think is needed here. However, I don't
>>> think there's a need for setting 'context' given I'm not aiming  
>>> for an
>>> interchange format, there are no clear rules about how to generate  
>>> it
>>> from an arbitrary graph (basically there can't be without some
>>> additional configuration) and it's not clear how to deal with
>>> datatypes or languages.
>>>
>>> I suppose my first question is whether there are any other JSON- 
>>> based
>>> formats that we should be aware of, that we could use or borrow  
>>> ideas
>>> from?
>>>
>>> Assuming there aren't, I wanted to discuss what generic rules we  
>>> might
>>> use, where configuration is necessary and how the configuration  
>>> might
>>> be done.
>>>
>>> # RDF Graphs #
>>>
>>> Let's take as an example:
>>>
>>>   <http://www.w3.org/TR/rdf-syntax-grammar>
>>>     dc:title "RDF/XML Syntax Specification (Revised)" ;
>>>     ex:editor [
>>>       ex:fullName "Dave Beckett" ;
>>>       ex:homePage <http://purl.org/net/dajobe/> ;
>>>     ] .
>>>
>>> In JSON, I think we'd like to create something like:
>>>
>>>   {
>>>     "$": "http://www.w3.org/TR/rdf-syntax-grammar",
>>>     "title": "RDF/XML Syntax Specification (Revised)",
>>>     "editor": {
>>>       "name": "Dave Beckett",
>>>       "homepage": "http://purl.org/net/dajobe/"
>>>     }
>>>   }
>>>
>>> Note that the "$" is taken from RDFj. I'm not convinced it's a good
>>> idea to use this symbol, rather than simply a property called  
>>> "about"
>>> or "this" -- any opinions?
>>>
>>> Also note that I've made no distinction in the above between a URI  
>>> and
>>> a literal, while RDFj uses <>s around literals. My feeling is that
>>> normal developers really don't care about the distinction between a
>>> URI literal and a pointer to a resource, and that they will base the
>>> treatment of the value of a property on the (name of) the property
>>> itself.
>>>
>>> So, the first piece of configuration that I think we need here is to
>>> map properties on to short names that make good JSON identifiers (ie
>>> name tokens without hyphens). Given that properties normally have
>>> lowercaseCamelCase local names, it should be possible to use that  
>>> as a
>>> default. If you need something more readable, though, it seems  
>>> like it
>>> should be possible to use a property of the property, such as:
>>>
>>>   ex:fullName api:jsonName "name" .
>>>   ex:homePage api:jsonName "homepage" .
>>>
>>> However, in any particular graph, there may be properties that have
>>> been given the same JSON name (or, even more probably, local  
>>> name). We
>>> could provide multiple alternative names that could be chosen  
>>> between,
>>> but any mapping to JSON is going to need to give consistent results
>>> across a given dataset for people to rely on it as an API, and that
>>> means the mapping can't be based on what's present in the data. We
>>> could do something with prefixes, but I have a strong aversion to
>>> assuming global prefixes.
>>>
>>> So I think this means that we need to provide configuration at an  
>>> API
>>> level rather than at a global level: something that can be used
>>> consistently across a particular API to determine the token that's
>>> used for a given property. For example:
>>>
>>>   <> a api:JSON ;
>>>     api:mapping [
>>>       api:property ex:fullName ;
>>>       api:name "name" ;
>>>     ] , [
>>>       api:property ex:homePage ;
>>>       api:name "homepage" ;
>>>     ] .
>>>
>>> There are four more areas where I think there's configuration we  
>>> need
>>> to think about:
>>>
>>>   * multi-valued properties
>>>   * typed and language-specific values
>>>   * nesting objects
>>>   * suppressing properties
>>>
>>> ## Multi-valued Properties ##
>>>
>>> First one first. It seems obvious that if you have a property with
>>> multiple values, it should turn into a JSON array structure. For
>>> example:
>>>
>>>   [] foaf:name "Anna Wilder" ;
>>>     foaf:nick "wilding", "wilda" ;
>>>     foaf:homepage <http://example.org/about> .
>>>
>>> should become something like:
>>>
>>>   {
>>>     "name": "Anna Wilder",
>>>     "nick": [ "wilding", "wilda" ],
>>>     "homepage": "http://example.org/about"
>>>   }
>>>
>>> The trouble is that if you determine whether something is an array  
>>> or
>>> not based on the data that is actually available, you'll get
>>> situations where the value of a particular JSON property is  
>>> sometimes
>>> an array and sometimes a string; that's bad for predictability for  
>>> the
>>> people using the API. (RDF/JSON solves this by every value being an
>>> array, but that's counter-intuitive for normal developers.)
>>>
>>> So I think a second API-level configuration that needs to be made is
>>> to indicate which properties should be arrays and which not:
>>>
>>>   <> a api:API ;
>>>     api:mapping [
>>>       api:property foaf:nick ;
>>>       api:name "nick" ;
>>>       api:array true ;
>>>     ] .
>>>
>>> ## Typed Values and Languages ##
>>>
>>> Typed values and values with languages are really the same  
>>> problem. If
>>> we have something like:
>>>
>>>   <http://statistics.data.gov.uk/id/local-authority-district/00PB>
>>>     skos:prefLabel "The County Borough of Bridgend"@en ;
>>>     skos:prefLabel "Pen-y-bont ar Ogwr"@cy ;
>>>     skos:notation "00PB"^^geo:StandardCode ;
>>>     skos:notation "6405"^^transport:LocalAuthorityCode .
>>>
>>> then we'd really want the JSON to look something like:
>>>
>>>   {
>>>     "$": "http://statistics.data.gov.uk/id/local-authority-district/00PB
>>> ",
>>>     "name": "The County Borough of Bridgend",
>>>     "welshName": "Pen-y-bont ar Ogwr",
>>>     "onsCode": "00PB",
>>>     "dftCode": "6405"
>>>   }
>>>
>>> I think that for this to work, the configuration needs to be able to
>>> filter values based on language or datatype to determine the JSON
>>> property name. Something like:
>>>
>>>   <> a api:JSON ;
>>>     api:mapping [
>>>       api:property skos:prefLabel ;
>>>       api:lang "en" ;
>>>       api:name "name" ;
>>>     ] , [
>>>       api:property skos:prefLabel ;
>>>       api:lang "cy" ;
>>>       api:name "welshName" ;
>>>     ] , [
>>>       api:property skos:notation ;
>>>       api:datatype geo:StandardCode ;
>>>       api:name "onsCode" ;
>>>     ] , [
>>>       api:property skos:notation ;
>>>       api:datatype transport:LocalAuthorityCode ;
>>>       api:name "dftCode" ;
>>>     ] .
>>>
>>> ## Nesting Objects ##
>>>
>>> Regarding nested objects, I'm again inclined to view this as a
>>> configuration option rather than something that is based on the
>>> available data. For example, if we have:
>>>
>>>   <http://example.org/about>
>>>     dc:title "Anna's Homepage"@en ;
>>>     foaf:maker <http://example.org/anna> .
>>>
>>>   <http://example.org/anna>
>>>     foaf:name "Anna Wilder" ;
>>>     foaf:homepage <http://example.org/about> .
>>>
>>> this could be expressed in JSON as either:
>>>
>>>   {
>>>     "$": "http://example.org/about",
>>>     "title": "Anna's Homepage",
>>>     "maker": {
>>>       "$": "http://example.org/anna",
>>>       "name": "Anna Wilder",
>>>       "homepage": "http://example.org/about"
>>>     }
>>>   }
>>>
>>> or:
>>>
>>>   {
>>>     "$": "http://example.org/anna",
>>>     "name": "Anna Wilder",
>>>     "homepage": {
>>>       "$": "http://example.org/about",
>>>       "title": "Anna's Homepage",
>>>       "maker": "http://example.org/anna"
>>>     }
>>>   }
>>>
>>> The one that's required could be indicated through the  
>>> configuration,
>>> for example:
>>>
>>>   <> a api:API ;
>>>     api:mapping [
>>>       api:property foaf:maker ;
>>>       api:name "maker" ;
>>>       api:embed true ;
>>>     ] .
>>>
>>> The final thought that I had for representing RDF graphs as JSON was
>>> about suppressing properties. Basically I'm thinking that this
>>> configuration should work on any graph, most likely one generated  
>>> from
>>> a DESCRIBE query. That being the case, it's likely that there will  
>>> be
>>> properties that repeat information (because, for example, they are a
>>> super-property of another property). It will make a cleaner JSON API
>>> if those repeated properties aren't included. So something like:
>>>
>>>   <> a api:API ;
>>>     api:mapping [
>>>       api:property admingeo:contains ;
>>>       api:ignore true ;
>>>     ] .
>>>
>>> # SPARQL Results #
>>>
>>> I'm inclined to think that creating JSON representations of SPARQL
>>> results that are acceptable to normal developers is less important
>>> than creating JSON representations of RDF graphs, for two reasons:
>>>
>>>   1. SPARQL naturally gives short, usable, names to the properties  
>>> in
>>> JSON objects
>>>   2. You have to be using SPARQL to create them anyway, and if  
>>> you're
>>> doing that then you can probably grok the extra complexity of having
>>> values that are objects
>>>
>>> Nevertheless, there are two things that could be done to simplify  
>>> the
>>> SPARQL results format for normal developers.
>>>
>>> One would be to just return an array of the results, rather than an
>>> object that contains a results property that contains an object  
>>> with a
>>> bindings property that contains an array of the results. People who
>>> want metadata can always request the standard SPARQL results JSON
>>> format.
>>>
>>> The second would be to always return simple values rather than
>>> objects. For example, rather than:
>>>
>>>   {
>>>     "head": {
>>>       "vars": [ "book", "title" ]
>>>     },
>>>     "results": {
>>>       "bindings": [
>>>         {
>>>           "book": {
>>>             "type": "uri",
>>>             "value": "http://example.org/book/book6"
>>>           },
>>>           "title": {
>>>             "type": "literal",
>>>             "value", "Harry Potter and the Half-Blood Prince"
>>>           }
>>>         },
>>>         {
>>>           "book": {
>>>             "type": "uri",
>>>             "value": "http://example.org/book/book5"
>>>           },
>>>           "title": {
>>>             "type": "literal",
>>>             "value": "Harry Potter and the Order of the Phoenix"
>>>           }
>>>         },
>>>         ...
>>>       ]
>>>     }
>>>   }
>>>
>>> a normal developer would want to just get:
>>>
>>>   [{
>>>     "book": "http://example.org/book/book6",
>>>     "title": "Harry Potter and the Half-Blood Prince"
>>>    },{
>>>      "book": "http://example.org/book/book5",
>>>      "title": "Harry Potter and the Order of the Phoenix"
>>>    },
>>>    ...
>>>   ]
>>> I don't think we can do any configuration here. It means that
>>> information about datatypes and languages isn't visible, but (a) I'm
>>> pretty sure that 80% of the time that doesn't matter, (b) there's
>>> always the full JSON version if people need it and (c) they could
>>> write SPARQL queries that used the datatype/language to populate
>>> different variables/properties if they wanted to.
>>>
>>> So there you are. I'd really welcome any thoughts or pointers about
>>> any of this: things I've missed, vocabularies we could reuse, things
>>> that you've already done along these lines, and so on. Reasons why
>>> none of this is necessary are fine too, but I'll warn you in advance
>>> that I'm unlikely to be convinced ;)
>>> Thanks,
>>> Jeni
>>>
>>> [1]: http://n2.talis.com/wiki/RDF_JSON_Specification
>>> [2]: http://www.w3.org/TR/rdf-sparql-json-res/
>>> [3]: http://code.google.com/p/ubiquity-rdfa/wiki/Rdfj
>>
>>
>> ************************************************************************
>> ********
>> DISCLAIMER: This e-mail is confidential and should not be used by  
>> anyone who is
>> not the original intended recipient. If you have received this e- 
>> mail in error
>> please inform the sender and delete it from your mailbox or any  
>> other storage
>> mechanism. Neither Macmillan Publishers Limited nor any of its  
>> agents accept
>> liability for any statements made which are clearly the sender's  
>> own and not
>> expressly made on behalf of Macmillan Publishers Limited or one of  
>> its agents.
>> Please note that neither Macmillan Publishers Limited nor any of  
>> its agents
>> accept any responsibility for viruses that may be contained in this  
>> e-mail or
>> its attachments and it is your responsibility to scan the e-mail and
>> attachments (if any). No contracts may be concluded on behalf of  
>> Macmillan
>> Publishers Limited or its agents by means of e-mail communication.  
>> Macmillan
>> Publishers Limited Registered in England and Wales with registered  
>> number 785998
>> Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS
>> ************************************************************************
>> ********
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.716 / Virus Database: 270.14.106/2563 - Release Date:  
>> 12/13/09 19:47:00
>
> -- 
> Richard Light
>

-- 
Jeni Tennison
http://www.jenitennison.com

Received on Monday, 14 December 2009 09:38:13 UTC