Re: Creating JSON from RDF

Hi Jenni,

Jeni Tennison wrote:

> On 13 Dec 2009, at 13:34, Dave Reynolds wrote:

>> I agree we want both graphs and SPARQL results but I think there is 
>> another third case - lists of described objects.
> 
> I absolutely agree with you that lists of described objects is an 
> essential part of an API. In fact, I was going to (and will!) write a 
> separate message about possible approaches for creating such lists.
> 
> It seemed to me that lists could be represented with RDF like:
> 
>   <http://statistics.data.gov.uk/doc/local-authority?page=1>
>     rdfs:label "Local Authorities - Page 1" ;
>     xhv:next <http://statistics.data.gov.uk/doc/local-authority?page=2> ;
>     ...
>     api:contents (
>       <http://statistics.data.gov.uk/id/local-authority/00QA>
>       <http://statistics.data.gov.uk/id/local-authority/00QB>
>       <http://statistics.data.gov.uk/id/local-authority/45UB>
>       ...
>     )

> This is just RDF, and as such any rules that we create about mapping RDF 
> graphs to JSON could apply. (I agree that the list page should include 
> extra information about the items in the list, but that seems to me to 
> be a separable issue.)

Sure but there are some advantages to treating this ordered list of 
results as an API issue rather than a modelling issue.

I'll respond properly on your other thread.

> One thing it makes me think is that perhaps JSON Schema [1] could form 
> the basis of the mechanism for expressing any extra stuff that's 
> required about the properties.

Interesting thought, I'll need to go learn more about JSON Schema first.

>>> Note that the "$" is taken from RDFj. I'm not convinced it's a good 
>>> idea to use this symbol, rather than simply a property called "about" 
>>> or "this" -- any opinions?
>>
>> I'd prefer "id" (though "about" is OK), "$" is too heavily overused in 
>> javascript libraries.
> 
> I agree. From the brief survey of JSON APIs that I did just now, it 
> seems as though prefixing a reserved property name with a '_' is the 
> usual thing. I'd suggest '_about' because it's similar to RDFa and 
> because '_id', to me at least, implies a local identifier rather than a 
> URI.

No objection to "_about", as per separate thread it was Freebase 
especially that motivated the suggestion of "id".

[On api:mapping usage]
>> Are you thinking of this as something the publisher provides or the 
>> API caller provides?
>>
>> If the former, then OK but as I say I think a zero config set of 
>> default conventions is OK with the API to allow fine tuning.
> 
> I'm thinking of this as something that the publisher of the API creates 
> (to describe/define the API). Note, though, that the publisher of the 
> API might not be the publisher of the data, and that it could feasibly 
> be possible for there to be a service that would allow clients to supply 
> a configuration, point at a datastore, and have the API just work.

OK, agreed. My concern is that developers shouldn't have to wade through 
this mapping to understand what they are getting, unless they are 
already RDF heads and care about that aspect.

[On multi-valued properties]
> I guess there are two choices if there was no specification:
> 
>   1. always give one value for the property; if there are several values 
> in the graph, then provide "the first"
>   2. give an array when there are multiple values and a singleton when 
> there's only one
> 
> I did have another vague notion of providing two properties side by 
> side, one singular and one plural, so you would have:
> 
>   {
>     "nick": "JeniT"
>   }
> 
> or
> 
>   {
>     "nicks": ["wilding", "wilda"]
>   }
> 
> side by side in the same list of objects. But of course that would 
> require configuration anyway (to provide pluralised versions of the 
> label), so I'm not particularly taken with it.
> 
> It does concern me that if there are RDF graphs which contain 
> descriptions of several resources of the same type, we might get into a 
> situation where there are two resources for which the default behaviour 
> would be different; we need to have a way of reconciling this (for 
> example, if any of the resources in the graph have multiple values for a 
> property, then it always uses an array).

Yes. With zero configuration there will always either be some 
inconsistency or you have to force the more general convention on 
people. I agree with Mark that developers can write code to adapt to the 
list/no-list case and with configuration we have the option to make this 
more consistent in places where this is a problem.

One possibility is a bootstrapping service where you give sample data 
and ontology, if available, and get back suggested mapping. That can do 
the scanning of data to guess at multi-valuedness once so you don't pay 
the cost of doing that in the live API.

> [snip]
>> Language codes are effectively open ended. I can't necessarily predict 
>> what lang codes are going to be in my data and provide a property 
>> mapping for every single one.
> 
> I know they're *potentially* open-ended; I think in practice, for a 
> single API, they are probably not. 

Depends on whether this is your own data or you are harvesting/receiving 
from multiple other sources and passing it on (in which case you have a 
lot less control).

> And even in the case of data that 
> does have multiple languages (eg DBPedia) it would be possible to create 
> a list based on the IANA language subtag registry [2] if you were 
> concerned.

You could but from the client's point of view trying all those property 
names in order to find a value it can use is going to be awkward.

>> Plus when working with language-tagged data you often have code to do 
>> a "best match" (not simple lookup) between the user's language 
>> preferences and the available lang tags. That looks hard if each is in 
>> a different property and the lang tags themselves are hidden in the 
>> API configuration.
>>
>> I think we may need the long winded encoding available:
>>
>> {
>>  "id" : "http://statistics.data.gov.uk/id/local-authority-district/00PB",
>>  "prefLabel" : [
>>    "The County Borough of Bridgend",
>>    { "value" : "The County Borough of Bridgend", "lang" : "en" },
>>    { "value" : "Pen-y-bont ar Ogwr", "lang : "cy" }
>>  ]
>>  ...
>>
>> Then it would up to the publisher whether provide the simpler 
>> properties as well or instead. But those could be regard as 
>> transformations of the RDF for convenience (much like choosing to 
>> include RDFS closure info).
> 
> As I say, I'm not convinced that this is a big enough issue to sweat 
> over, but another possibility would be to perform some basic string 
> manipulation to create separate properties as required. For example:
> 
>  {
>    "_about" : 
> "http://statistics.data.gov.uk/id/local-authority-district/00PB",
>    "prefLabel": "The County Borough of Bridgend",
>    "prefLabel_en": "The County Borough of Bridgend",
>    "prefLabel_cy": "Pen-y-bont ar Ogwr"
>  }
> 
> Note that the language of the value of the property without the language 
> suffix is probably something that you'd want in the API configuration 
> (and possibly overridable by the client).

Yes that is better though I think Mark's literal encoding would be 
easier to work with than the encoding in the property name.

>> For things like xsd:dateTime then there seems a couple of options. The 
>> Simile type option would be to have them as strings but define the 
>> range of the property in some associated context/properties table.
>>
>> The other would be to use a structured representation:
>>
>>  {
>>      "id" : "http://example.com/ourpaper",
>>      "date" : { "type" : date, "value" : "20091312"}
>>     ...
>>
>> I'm guessing you would just have them as strings and let the consumer 
>> figure out when they want to treat them as dates, is that right?
> 
> That would be my preference, but I think the strings should 
> (unfortunately) use formats understood by the Javascript Date.parse() 
> method [3]. So the above would be:
> 
>   {
>     "_about": "http://example.com/ourpaper",
>     "date": "13 Dec, 2009"
>   }

Ugh. I guess you are right, hadn't thought of that.

Cheers,
Dave

Received on Monday, 14 December 2009 16:59:29 UTC