Re: FPWD comment - literals, data types and language tags from Robert Sanderson on 2014-12-15 (public-annotation@w3.org from December 2014)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Mon, 15 Dec 2014 08:27:00 -0800
To: Ivan Herman <ivan@w3.org>
Cc: Jeff Young <jyoung@oclc.org>, Antoine Isaac <aisaac@few.vu.nl>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CABevsUHitAR8N7YJmLaogPMkoEg=B5i0VV_1+Lt0LiiJHp2zSw@mail.gmail.com>

Hi Ivan,

On Mon, Dec 15, 2014 at 1:21 AM, Ivan Herman <ivan@w3.org> wrote:
>
> > Firstly, I agree wholeheartedly with Jeff's position that we shouldn't
> be mixing up model and serialization ... but that's exactly what the
> requirement is for string literal bodies.
> Actually, we are in the strange situation where the Turtle serialization
> is, in this case, much clearer than the JSON-LD.


Yes, and JSON-LD is significantly less clear given the context of also
wanting to use value and language as keys of a resource in the same
position in the structure.

 If switch to Turtle, then the following three alternatives are fairly
> clear as far as the (RDF) model goes the following are the literal
> alternatives:
> <> oa:body "ABCD" .
> But, actually, per RDF 1.1, the first version is just a shorthand for
> <> oa:body "ABCD"^^xsd:string.
>

Right. This is why we added the requirement in the spec that the body be an
xsd:string, otherwise the two might not be the same.


> but what this means is that, in the model, these two JSON-LD constructs
> are identical:
> "body" : "ABCD"
> "body" : { "@value" : "ABCD", "@type" : "xsd:string" }
> In other words, because we allow simple strings in the model it is already
> perfectly legal, in JSON-LD, to write the second option.


Yes, well understood, and hence that additional text in the spec saying
that the serialization must be a plain string literal, as that was the
requirement to fulfill.  As soon as we allow both "abcd" and
{"@value":"abcd", "@type":"xsd:string"}, then we've lost any advantage of
having string literals.  All systems will need to check both, at which
point we should just have a consistent model without literal bodies.

Ie, the precise definition, from an OA modeling point of view (and this may
> not be the text in the FPWD, so we may have to deal with this) is that the
> current OA model allows for the value of oa:body to be a (typed!) literal,
> but does not allow for a language literal. There is a severe inconsistency
> at this point, how do we explain developers that this format is allowed
> (because it must be) and the same with a language tag is not?
>


So, to rephrase, if we MUST NOT require that the serialization NOT give an
explicit data type, then that is a point we need to take into account in
the discussion of whether to allow literal bodies, and if so how.


> If we allowed language tagged and (separately) data-typed literals we
> would have the following possible combinations all being legal that clients
> would have to take into account:
> > "body": "string"
> > "body": {"@value": "string"}
> > "body": {"value": "string"}
> > "body": {"@value": "string", "@language": "en"}
> > "body": {"value": "string", "language": "en"}
> > "body": {"@value": "<b>string</b>", "@type": "rdf:HTML"}
> > "body": {"value": "<b>string</b>", "format": "text/html"}
> > // Not @value, @language, @type !
> > "body: {"value": "<b>string</b>", "language": "en", "format":
> "text/html"}
> >
> > Are you SURE that is preferable to a single consistent model?
>


> You are right, there is a source of confusion and inconsistency but, as I
> noted before, there is also another at hand due to the typed/plain literal
> conflation.
>

Agreed, confusion abounds :)


I know this has been discussed before, but I am more and more in favour of
> separating the "body" into two different predicates. One for simple bodies
> that are literals (datatyped or not, language tagged or not), and the other
> that take a resource as a value, allowing more complicated construct. And
> the text should make it clear that the latter is really important when, for
> example, and ID is added to the value, or when non textual content is used
> for the body that requires a media type. It seems that conflating the two
> indeed leads to issues.
>

Agreed, that would be preferable to allowing data types or language tags
with the same predicate, resulting in the confusing situation above.

I would propose oa:bodyValue for the string (json-ld key "bodyValue") and
require oa:hasBody / "body" to be always a resource.

It still allows for an annotation of the form:

{
  "@type": "oa:Annotation",
  "bodyValue": {"@value": "abcd", "@language":"en"},
  "body": {"value": "efgh", "language": "en"},
  "target": "http://example.org/"
}

but that seems less bad than the alternatives to me.


Rob

Received on Monday, 15 December 2014 16:27:35 UTC