Re: JSON-LD context for schema.org - work in progress from Gregg Kellogg on 2014-04-23 (www-archive@w3.org from April 2014)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Wed, 23 Apr 2014 07:32:36 -0700
To: Markus Lanthaler <markus.lanthaler@gmx.net>
Cc: Niklas Lindström <lindstream@gmail.com>, Dan Brickley <danbri@google.com>, Stéphane Corlosquet <scorlosquet@gmail.com>, Manu Sporny <msporny@digitalbazaar.com>, "<www-archive@w3.org>" <www-archive@w3.org>
Message-Id: <CEE369C6-0AB3-47AF-BC8F-6E275697632A@greggkellogg.net>
On Apr 23, 2014, at 6:47 AM, "Markus Lanthaler" <markus.lanthaler@gmx.net> wrote:
> 
> First of all, thanks a lot Dan for making this happen.
> 
>> On Thursday, April 17, 2014 6:29 PM, Niklas Lindström wrote:
>>> On Thu, Apr 17, 2014 at 5:30 PM, Dan Brickley wrote:
>>>> On 17 April 2014 12:36, Dan Brickley wrote:
>>>> Hi folks
>>>> 
>>>> curl -v -H "Accept: application/ld+json" http://sdo-context-test.appspot.com/
>>>> 
>>>> ... is a start at JSON-LD context file serving. For now it just emits
>>>> a static file that I built with a script from Niklas,
>>>> https://gist.github.com/niklasl/7873635
>>>> https://github.com/json-ld/json-ld.org/pull/297
>>>> 
>>>> My main concern is that this should not impact human users, so the
>>>> content negotiation settings are a bit conservative: if the client
>>>> asks for JSON-LD and does not mention HTML or XHTML in its request, I
>>>> send JSON-LD. Otherwise (and regardless of ;q=0.6 -style HTTP
>>>> subtleties, I send the normal HTML.
>> 
>> That sounds quite reasonable. I wouldn't expect any problems in
>> practise, since implementations looking for a context really only want
>> it in JSON-LD, and so shouldn't ask for any fallback formats.
> 
> I fully agree with Niklas. The only other format that tools might look for is plain old JSON (application/json) as I would expect that a lot of people don't set up the media type correctly. IMO you could go even as far as just checking for the presence of any (X)HTML media type and only return JSON-LD if there's none.

I agree, I think that asking for plain JSON should return the JSON-LD.

>>>> Once we're happy with the HTTP mechanism, let's talk about what
>>>> actually goes into the file.
>>> 
>>> Let's start that conversation.
>>> 
>>> Strawman:
>>> 
>>> {
>>>  "@context": {
>>>      "partOfSeries": {"@type": "@id" }
>>>      "servicePostalAddress": {"@type": "@id" }
>>>      "workLocation": {"@type": "@id" }
>>>      "arterialBranch": {"@type": "@id" }
>>> ...
>>> }
>>> }
>>> 
>>> https://gist.github.com/danbri/10991489
>>> 
>>> this lists as @id every property that has at least one non-literal
>>> expected value type, and leaves the rest to defaults.
>>> 
>>> Workable?
> 
> IMO, yes. But obviously Niklas is right when he says that
> 
> 
>> There is a potential problem with this approach. With value coercion,
>> any non-object value for the property has to be a valid URL (or CURIE)
>> string. If it isn't, the result will be an invalid @id value.
>> Actually, implementations of the API algorithm (e.g. the JSON-LD
>> playground) seem to create a JSON-LD object with a malformed URL
>> ("@id" value). Example:
>> 
>>    {
>>      "@context": {
>>        "@vocab": "http://schema.org/",
>>        "creator": {"@type": "@id"}
>>      },
>>      "creator": "Niklas Lindström"
>>    }
>> 
>> Which yields the triple:
>> 
>> _:b0 <http://schema.org/creator> <http://json-ld.org/playground/Niklas Lindström> .
>> 
>> Coercion is just a syntactic shorthand without any error correction or
>> smart fallback. It is great for when values are given as strings but
>> always expected to represent a URL or specific datatype, but that's
>> all.
>> 
>> So the only way to get correct data for properties expecting either
>> strings or things, is to either not coerce them (which I'd recommend
>> in general), and thus to and require thing values with only a URL to
>> be written like:
>> 
>>      "creator": {"@id": "http://neverspace.net/id#self"}
>> 
>> Or to do coercion, and require plain strings to be written like: 
>> 
>>      "creator": {"@value": "Niklas Lindström"}
>> 
>> Which looks odd and is probably hard to teach. (And it would be better
>> to use "name" or "alternateName" there anyway.)
> 
> I think teaching developers to use 
> 
>       "creator": { "name": "Niklas Lindström" }
> 
> constructs in cases like this is the way to go. It is quite trivial to explain and mistakes are much easier to detect as the value would otherwise transformed to a long URL whereas it just stays the same otherwise (so if you arent't an expert, you wouldn't really look carefully for that).

The problem here is that every schema.org property that has an object range, not just those that are both, can take a string in schema.org. IMO, this does not mean that no term should have @type: @id.

The approach Markus suggests is what people should learn. IMO, when a publisher uses a string where an object is expected, this is what they mean, so we should encourage them to say it this way.

>> I think the better option at this stage of the game is for consumers
>> to recommend explicit object values for these "hybrid" properties
>> (taking strings or things), but perhaps to look for string values that
>> look like URLs and automagically coerce those if needed.
> 
> IMO the better way is to coerce them to @id as Dan did. This is not HTML where you have to work with what you have in your DOM tree. Thus, I'm convinced that developers will almost intuitively do the right thing and create a "Person object" if the range tells them to do so. On the other hand, requiring the usage of an @id-object to mark up a URL seems counterintuitive given that JSON-LD is able to handle that so elegantly.
> 
> All that being said, the most important thing at this stage is to publish a context.

+1

Gregg

> Cheers,
> Markus
> 
> 
> --
> Markus Lanthaler
> @markuslanthaler
>
Received on Wednesday, 23 April 2014 14:33:09 UTC