Re: FPWD comment - literals, data types and language tags

Hey Rob,

> On 15 Dec 2014, at 02:38 , Robert Sanderson <azaroth42@gmail.com> wrote:
> 
> 
> Hi Ivan, Jeff,
> 
> Firstly, I agree wholeheartedly with Jeff's position that we shouldn't be mixing up model and serialization ... but that's exactly what the requirement is for string literal bodies.

Actually, we are in the strange situation where the Turtle serialization is, in this case, much clearer than the JSON-LD. This is indeed an area where JSON-LD, because it follows the JSON usage, gets more complicated than Turtle, which follows closely the RDF model. Hence I will (also) use Turtle versions below to make the points clear.

> 
> The model requirement would be:  I want to embed a textual comment within the annotation as the body.  That's precisely what the EmbeddedContent resource does (and ContentAsText used to do).  Indeed, when the requirement was brought up at the CSV WG (for example) the negative reaction was not to the model, but to the serialization as a JSON object instead of a string.

The discussion was indeed around JSON, we did not discuss the model issues.

> 
> The requirement to have language associated with a string is also perfectly reasonable, and already accounted for.   The requirement to have format associated with a string, ditto.  And the clincher is having both format and language is not possible with literals in RDF, and must be done with a resource.
> 

The combination of the two is indeed not possible. And that is indeed a problem.

However, this made me realize that we do have an interesting situation. If switch to Turtle, then the following three alternatives are fairly clear as far as the (RDF) model goes the following are the literal alternatives:

<> oa:body "ABCD" .
<> oa:body "ABCD"@en.
<> oa:body "1234"^^xsd:integer.

But, actually, per RDF 1.1, the first version is just a shorthand for

<> oa:body "ABCD"^^xsd:string.

Ie, formally, only the second and third are existing literals in RDF. This is because in RDF1.1 there is no concept of plain literals any more, only literals with either datatypes or language tags (and yes, there is no way of combining these two, which is a pain). See

http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal

but what this means is that, in the model, these two JSON-LD constructs are identical:

"body" : "ABCD"
"body" : { "@value" : "ABCD", "@type" : "xsd:string" }

In other words, because we allow simple strings in the model it is already perfectly legal, in JSON-LD, to write the second option. Ie, the precise definition, from an OA modeling point of view (and this may not be the text in the FPWD, so we may have to deal with this) is that the current OA model allows for the value of oa:body to be a (typed!) literal, but does not allow for a language literal. There is a severe inconsistency at this point, how do we explain developers that this format is allowed (because it must be) and the same with a language tag is not?


> So, if we want to not talk about serialization, that's fine, but the answer is to get rid of string literals and have a single, coherent model for embedded content.
> 
> If value was mapped to @value and language to @language, that would work until someone wanted to associate a format with the content. And then it would break spectacularly.
> 

Yes, you are right, there is a source of confusion there.

> If we allowed language tagged and (separately) data-typed literals we would have the following possible combinations all being legal that clients would have to take into account:
> 
> "body": "string"
> "body": {"@value": "string"}
> "body": {"value": "string"}
> "body": {"@value": "string", "@language": "en"}
> "body": {"value": "string", "language": "en"}
> "body": {"@value": "<b>string</b>", "@type": "rdf:HTML"}
> "body": {"value": "<b>string</b>", "format": "text/html"}
> // Not @value, @language, @type !
> "body: {"value": "<b>string</b>", "language": "en", "format": "text/html"}
> 
> Are you SURE that is preferable to a single consistent model?  Do you REALLY want us to have to explain that the @value/@language/@type combination is not allowed because RDF (oh oh, warning bells!) doesn't allow it and hence developers need to deal with lots of special cases?  That is the consequence of going down this route.

You are right, there is a source of confusion and inconsistency but, as I noted before, there is also another at hand due to the typed/plain literal conflation.

I know this has been discussed before, but I am more and more in favour of separating the "body" into two different predicates. One for simple bodies that are literals (datatyped or not, language tagged or not), and the other that take a resource as a value, allowing more complicated construct. And the text should make it clear that the latter is really important when, for example, and ID is added to the value, or when non textual content is used for the body that requires a media type. It seems that conflating the two indeed leads to issues.

Ivan


> 
> I am strongly against it.
> 
> Rob
> 
> 
> On Sun, Dec 14, 2014 at 1:18 AM, Ivan Herman <ivan@w3.org> wrote:
> Jeff
> 
> I think this can be mitigated, actually. We need again a JSON-LD expert but I would expect the @context to be able to map a "value" to "@value" and "language" to "@language", if we really do not want to impose the usage of "@" signs for special properties. Ie, we do not have a different JSON serialization.
> 
> If this is so, then, in fact, we solve both issues. The JSON-LD idiom
> 
> "body" : {"value": "hi", "language": "en"}
> 
> would, in fact, be equivalent to
> 
> "body" : {"@value": "hi", "@language": "en"}
> 
> which means that the value of "body" is a language string, not a separate resource. Which is exactly what Antoine wants. In fact, this is one of the cases where the Turtle serialization is way more readable, because it would be
> 
> <> oa:body "hi"@en .
> 
> _If_ the @context cannot be used for the mapping above, then I fully agree with Jeff's reaction. We should then use the "@value" and "@en" terminology in JSON-LD (I do not think that would really create major issues of acceptance, frankly), and thereby avoid a duplication of concepts. As a bonus, we also solve Antoine's issue for free:-)
> 
> 
> Ivan
> 
> 
> > On 14 Dec 2014, at 04:05 , Young,Jeff (OR) <jyoung@OCLC.ORG> wrote:
> >
> > The serialization may look "the same" to human eyes, but here's the spec for JSON-LD:
> >
> > http://www.w3.org/TR/json-ld/#string-internationalization
> >
> > It would be a shame if Web Annotations invented another JSON serialization. What would that one be called?
> >
> > http://blog.ldodds.com/2010/12/02/rdf-and-json-a-clash-of-model-and-syntax/
> >
> > On Dec 13, 2014, at 8:21 PM, Robert Sanderson <azaroth42@gmail.com> wrote:
> >
> >>
> >> Hi Antoine, all,
> >>
> >> To me the only value of literal bodies is to enable a simple string.  As soon as you add data types or language tags, the complexity and even serialization is the same as using a full resource, almost character for character:
> >>
> >> Language tagged string:
> >>     {"@value": "hi", "@language": "en"}
> >>
> >> Real (blank node) resource:
> >>     {"value": "hi", "language": "en"}
> >>
> >> The argument against always using a resource, and thereby enabling full OWL reasoning and a consistent model that respects the web architecture, is that people want to have only a string without anything else ... so that's all we allowed.
> >>
> >> As the model allows for language (and format and other metadata) to be associated with the body using the (preferred) resource method, and we don't want to create multiple ways to do the same thing, I'm against (as you might imagine) allowing language tagged literals.
> >>
> >> Rob
> >>
> >>
> >> On Sat, Dec 13, 2014 at 7:46 AM, Antoine Isaac <aisaac@few.vu.nl> wrote:
> >> Dear all,
> >>
> >> Congrats for the WD!
> >>
> >> I'm looking at it now, and I'm really happy with simple bodies.
> >> I am not so happy however with the fact that the spec does forbid the use of language tags on simple text bodies.
> >>
> >> In the OA community, several months ago, there was a discussion about XML-datatypes vs plain-literals-with-language-tags. As Rob reminded me in private mail:
> >> [
> >> The issue with datatypes and language tags is:
> >>
> >> * You can't have both at once in RDF, so we need to allow:
> >>     {"value": "hi", "language": "en", "format": "text/plain"}
> >>
> >> * In JSON-LD, the difference between a language tagged literal, and the resource is very confusing:
> >> "hi"@en is:  {"@value": "hi", "@language" : "en"}
> >> Whereas the blank node just drops the @s:   {"value": "hi", "language": "en"}
> >> If we allowed both patterns, we'd be losing any simplicity and understandability gains by having them at all.  So if there's more information than just the value, including format, language, creator, etc, then it has to be a resource following the web architecture.
> >>
> >> * If the body data type isn't fixed, you would end up with the same situation as language tags, just with @type and format.  So to prevent it we require it to be an xsd:string, which will serialize to just the string in JSON-LD compaction.
> >> ]
> >>
> >> At the time I accepted these arguments. But I think what convinced me then is that there were cases mentioned for literal bodies that would have a different (XML) datatype.
> >> Now the FPWD mentions only simple text bodies. So this makes the case for not using plain literals a lot weaker.
> >>
> >> If I was a fresh reader, it would puzzle me a lot to find the sentence
> >> "The string body MUST be an xsd:string and MUST NOT have a language associated with it."
> >> And as someone with a key interest in multilingual scenarios, I want to raise again the issue!
> >> In fact the current solution makes annotations-with-literals useless in many of the Europeana cases, so I'd rather see a very good case for not enabling language tags.
> >>
> >> Best regards,
> >>
> >> Antoine
> >>
> >>
> >>
> >> --
> >> Rob Sanderson
> >> Information Standards Advocate
> >> Digital Library Systems and Services
> >> Stanford, CA 94305
> 
> 
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
> 
> 
> 
> 
> 
> 
> --
> Rob Sanderson
> Information Standards Advocate
> Digital Library Systems and Services
> Stanford, CA 94305


----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

Received on Monday, 15 December 2014 09:21:49 UTC