Re: FPWD comment - literals, data types and language tags from Robert Sanderson on 2014-12-15 (public-annotation@w3.org from December 2014)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Sun, 14 Dec 2014 19:39:08 -0800
To: "Young,Jeff (OR)" <jyoung@oclc.org>
Cc: Ivan Herman <ivan@w3.org>, Antoine Isaac <aisaac@few.vu.nl>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CABevsUGx1Hda_=nX5cZNEk5tSRJug7u9o1N7ct2TO3TLniSJOQ@mail.gmail.com>
oa:EmbeddedResource is a class that is typically applied to blank nodes.
The implication of the class being that the representation of the resource
is present as the literal object of the rdf:value predicate.  So they would
be achieving that purpose with a blank node.

By hash URI, do you mean a data:// URI? That could be done, but seems
unnecessarily complex and really makes a mockery of the web architecture
(IMO).

The warning bells point was that one of the reasons we have literal bodies
is the anti-RDF, anti-modeling crowd want them.  So to then have to explain
a bunch of special cases due to RDF constraints because allowing language
tags/data types, seems even more counter-productive.

Rob




On Sun, Dec 14, 2014 at 6:54 PM, Young,Jeff (OR) <jyoung@oclc.org> wrote:
>
>  I'm also curious about the need for EmbeddedContent as a resource type.
> If a data publisher *wants* to embed a "content resource" alongside another
> resource(s), they could (but shouldn't be required to) achieve that purpose
> with a blank node or a hash URI.
>
>  Are the RDF warning bells a sign that it isn't being assumed for this
> application?
>
> On Dec 14, 2014, at 9:35 PM, Young,Jeff (OR) <jyoung@oclc.org> wrote:
>
>   This seems similar to how SKOS-XL"upgrades" strings to things:
> http://www.w3.org/2008/05/skos-xl#Label
>
>  SKOS-XL labels are heavyweight in comparison to SKOS labels, but some of
> the weight can be "hidden" in the JSON-LD context. That is assuming people
> publish and consume the "hidden" level correctly.
>
>
>
> On Dec 14, 2014, at 8:38 PM, Robert Sanderson <azaroth42@gmail.com> wrote:
>
>
> Hi Ivan, Jeff,
>
>  Firstly, I agree wholeheartedly with Jeff's position that we shouldn't
> be mixing up model and serialization ... but that's exactly what the
> requirement is for string literal bodies.
>
>  The model requirement would be:  I want to embed a textual comment
> within the annotation as the body.  That's precisely what the
> EmbeddedContent resource does (and ContentAsText used to do).  Indeed, when
> the requirement was brought up at the CSV WG (for example) the negative
> reaction was not to the model, but to the serialization as a JSON object
> instead of a string.
>
>  The requirement to have language associated with a string is also
> perfectly reasonable, and already accounted for.   The requirement to have
> format associated with a string, ditto.  And the clincher is having both
> format and language is not possible with literals in RDF, and must be done
> with a resource.
>
>  So, if we want to not talk about serialization, that's fine, but the
> answer is to get rid of string literals and have a single, coherent model
> for embedded content.
>
>  If value was mapped to @value and language to @language, that would work
> until someone wanted to associate a format with the content. And then it
> would break spectacularly.
>
>  If we allowed language tagged and (separately) data-typed literals we
> would have the following possible combinations all being legal that clients
> would have to take into account:
>
>  "body": "string"
> "body": {"@value": "string"}
> "body": {"value": "string"}
> "body": {"@value": "string", "@language": "en"}
>  "body": {"value": "string", "language": "en"}
> "body": {"@value": "<b>string</b>", "@type": "rdf:HTML"}
>  "body": {"value": "<b>string</b>", "format": "text/html"}
> // Not @value, @language, @type !
> "body: {"value": "<b>string</b>", "language": "en", "format": "text/html"}
>
>  Are you SURE that is preferable to a single consistent model?  Do you
> REALLY want us to have to explain that the @value/@language/@type
> combination is not allowed because RDF (oh oh, warning bells!) doesn't
> allow it and hence developers need to deal with lots of special cases?
> That is the consequence of going down this route.
>
>  I am strongly against it.
>
>  Rob
>
>
> On Sun, Dec 14, 2014 at 1:18 AM, Ivan Herman <ivan@w3.org> wrote:
>>
>> Jeff
>>
>> I think this can be mitigated, actually. We need again a JSON-LD expert
>> but I would expect the @context to be able to map a "value" to "@value" and
>> "language" to "@language", if we really do not want to impose the usage of
>> "@" signs for special properties. Ie, we do not have a different JSON
>> serialization.
>>
>> If this is so, then, in fact, we solve both issues. The JSON-LD idiom
>>
>> "body" : {"value": "hi", "language": "en"}
>>
>> would, in fact, be equivalent to
>>
>> "body" : {"@value": "hi", "@language": "en"}
>>
>> which means that the value of "body" is a language string, not a separate
>> resource. Which is exactly what Antoine wants. In fact, this is one of the
>> cases where the Turtle serialization is way more readable, because it would
>> be
>>
>> <> oa:body "hi"@en .
>>
>> _If_ the @context cannot be used for the mapping above, then I fully
>> agree with Jeff's reaction. We should then use the "@value" and "@en"
>> terminology in JSON-LD (I do not think that would really create major
>> issues of acceptance, frankly), and thereby avoid a duplication of
>> concepts. As a bonus, we also solve Antoine's issue for free:-)
>>
>>
>> Ivan
>>
>>
>> > On 14 Dec 2014, at 04:05 , Young,Jeff (OR) <jyoung@OCLC.ORG> wrote:
>> >
>> > The serialization may look "the same" to human eyes, but here's the
>> spec for JSON-LD:
>> >
>> > http://www.w3.org/TR/json-ld/#string-internationalization
>> >
>> > It would be a shame if Web Annotations invented another JSON
>> serialization. What would that one be called?
>> >
>> >
>> http://blog.ldodds.com/2010/12/02/rdf-and-json-a-clash-of-model-and-syntax/
>> >
>> > On Dec 13, 2014, at 8:21 PM, Robert Sanderson <azaroth42@gmail.com>
>> wrote:
>> >
>> >>
>> >> Hi Antoine, all,
>> >>
>> >> To me the only value of literal bodies is to enable a simple string.
>> As soon as you add data types or language tags, the complexity and even
>> serialization is the same as using a full resource, almost character for
>> character:
>> >>
>> >> Language tagged string:
>> >>     {"@value": "hi", "@language": "en"}
>> >>
>> >> Real (blank node) resource:
>> >>     {"value": "hi", "language": "en"}
>> >>
>> >> The argument against always using a resource, and thereby enabling
>> full OWL reasoning and a consistent model that respects the web
>> architecture, is that people want to have only a string without anything
>> else ... so that's all we allowed.
>> >>
>> >> As the model allows for language (and format and other metadata) to be
>> associated with the body using the (preferred) resource method, and we
>> don't want to create multiple ways to do the same thing, I'm against (as
>> you might imagine) allowing language tagged literals.
>> >>
>> >> Rob
>> >>
>> >>
>> >> On Sat, Dec 13, 2014 at 7:46 AM, Antoine Isaac <aisaac@few.vu.nl>
>> wrote:
>> >> Dear all,
>> >>
>> >> Congrats for the WD!
>> >>
>> >> I'm looking at it now, and I'm really happy with simple bodies.
>> >> I am not so happy however with the fact that the spec does forbid the
>> use of language tags on simple text bodies.
>> >>
>> >> In the OA community, several months ago, there was a discussion about
>> XML-datatypes vs plain-literals-with-language-tags. As Rob reminded me in
>> private mail:
>> >> [
>> >> The issue with datatypes and language tags is:
>> >>
>> >> * You can't have both at once in RDF, so we need to allow:
>> >>     {"value": "hi", "language": "en", "format": "text/plain"}
>> >>
>> >> * In JSON-LD, the difference between a language tagged literal, and
>> the resource is very confusing:
>> >> "hi"@en is:  {"@value": "hi", "@language" : "en"}
>> >> Whereas the blank node just drops the @s:   {"value": "hi",
>> "language": "en"}
>> >> If we allowed both patterns, we'd be losing any simplicity and
>> understandability gains by having them at all.  So if there's more
>> information than just the value, including format, language, creator, etc,
>> then it has to be a resource following the web architecture.
>> >>
>> >> * If the body data type isn't fixed, you would end up with the same
>> situation as language tags, just with @type and format.  So to prevent it
>> we require it to be an xsd:string, which will serialize to just the string
>> in JSON-LD compaction.
>> >> ]
>> >>
>> >> At the time I accepted these arguments. But I think what convinced me
>> then is that there were cases mentioned for literal bodies that would have
>> a different (XML) datatype.
>> >> Now the FPWD mentions only simple text bodies. So this makes the case
>> for not using plain literals a lot weaker.
>> >>
>> >> If I was a fresh reader, it would puzzle me a lot to find the sentence
>> >> "The string body MUST be an xsd:string and MUST NOT have a language
>> associated with it."
>> >> And as someone with a key interest in multilingual scenarios, I want
>> to raise again the issue!
>> >> In fact the current solution makes annotations-with-literals useless
>> in many of the Europeana cases, so I'd rather see a very good case for not
>> enabling language tags.
>> >>
>> >> Best regards,
>> >>
>> >> Antoine
>> >>
>> >>
>> >>
>> >> --
>> >> Rob Sanderson
>> >> Information Standards Advocate
>> >> Digital Library Systems and Services
>> >> Stanford, CA 94305
>>
>>
>>  ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>
>>
>>
>>
>>
>
>  --
>   Rob Sanderson
> Information Standards Advocate
> Digital Library Systems and Services
> Stanford, CA 94305
>
>

-- 
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305
Received on Monday, 15 December 2014 03:39:36 UTC