Re: FPWD comment - literals, data types and language tags

The serialization may look "the same" to human eyes, but here's the spec for JSON-LD:

http://www.w3.org/TR/json-ld/#string-internationalization

It would be a shame if Web Annotations invented another JSON serialization. What would that one be called?

http://blog.ldodds.com/2010/12/02/rdf-and-json-a-clash-of-model-and-syntax/

On Dec 13, 2014, at 8:21 PM, Robert Sanderson <azaroth42@gmail.com<mailto:azaroth42@gmail.com>> wrote:


Hi Antoine, all,

To me the only value of literal bodies is to enable a simple string.  As soon as you add data types or language tags, the complexity and even serialization is the same as using a full resource, almost character for character:

Language tagged string:
    {"@value": "hi", "@language": "en"}

Real (blank node) resource:
    {"value": "hi", "language": "en"}

The argument against always using a resource, and thereby enabling full OWL reasoning and a consistent model that respects the web architecture, is that people want to have only a string without anything else ... so that's all we allowed.

As the model allows for language (and format and other metadata) to be associated with the body using the (preferred) resource method, and we don't want to create multiple ways to do the same thing, I'm against (as you might imagine) allowing language tagged literals.

Rob


On Sat, Dec 13, 2014 at 7:46 AM, Antoine Isaac <aisaac@few.vu.nl<mailto:aisaac@few.vu.nl>> wrote:
Dear all,

Congrats for the WD!

I'm looking at it now, and I'm really happy with simple bodies.
I am not so happy however with the fact that the spec does forbid the use of language tags on simple text bodies.

In the OA community, several months ago, there was a discussion about XML-datatypes vs plain-literals-with-language-tags. As Rob reminded me in private mail:
[
The issue with datatypes and language tags is:

* You can't have both at once in RDF, so we need to allow:
    {"value": "hi", "language": "en", "format": "text/plain"}

* In JSON-LD, the difference between a language tagged literal, and the resource is very confusing:
"hi"@en is:  {"@value": "hi", "@language" : "en"}
Whereas the blank node just drops the @s:   {"value": "hi", "language": "en"}
If we allowed both patterns, we'd be losing any simplicity and understandability gains by having them at all.  So if there's more information than just the value, including format, language, creator, etc, then it has to be a resource following the web architecture.

* If the body data type isn't fixed, you would end up with the same situation as language tags, just with @type and format.  So to prevent it we require it to be an xsd:string, which will serialize to just the string in JSON-LD compaction.
]

At the time I accepted these arguments. But I think what convinced me then is that there were cases mentioned for literal bodies that would have a different (XML) datatype.
Now the FPWD mentions only simple text bodies. So this makes the case for not using plain literals a lot weaker.

If I was a fresh reader, it would puzzle me a lot to find the sentence
"The string body MUST be an xsd:string and MUST NOT have a language associated with it."
And as someone with a key interest in multilingual scenarios, I want to raise again the issue!
In fact the current solution makes annotations-with-literals useless in many of the Europeana cases, so I'd rather see a very good case for not enabling language tags.

Best regards,

Antoine



--
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305

Received on Sunday, 14 December 2014 03:06:15 UTC