Re: @value/@type/@language combination from Stian Soiland-Reyes on 2014-09-03 (public-linked-json@w3.org from September 2014)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Wed, 3 Sep 2014 22:39:26 +0100
To: Robert Sanderson <azaroth42@gmail.com>
Cc: Linked JSON <public-linked-json@w3.org>, Gregg Kellogg <gregg@greggkellogg.net>
Message-ID: <CAPRnXtkBZa32B+SKN5P2f0s8Ybj4u+K1pLxzw=jE9VpsROnX6g@mail.gmail.com>
Language is just one of several factors that can be used in http when
negotiating for a resource representation. RDF model has brought language
in to the model, typically for the case of something having different names
in various languages.. e.g. "Copenhagen"@en vs "København"@dk as the
rdfs:label on a resource :CPH.

Ideally if you want to move into negotiation land, then I would just start
using Linked Data for what it is worth and host such resources on http with
full content negotiation - content type, language and all. I would not try
to squeeze in such additional measures  into the RDF level. Should we then
also allow language literal floats? After all, in my native language we use
"," as the decimal point, why should my lovely xsd:float literal have to be
represented with imperial "."? What about money amounts in different
currencies? Or a product having different prices in euro for each  country?

Well, basically the literal is not meant to model complex data structures
with rich metadata. RDF Resources can however do this job brilliantly.

And if you are worried about ContentInRDF, which sadly was never properly
released (although we asked them several times from OA community group),
then these use cases are exactly what rdf:value was meant for - a literal
as a resource.

HTML vs plain text would be another aspect, say I am populating a <title>
tag from Linked Data, then I can't use the HTML literals without further
processing. I would also need to do more clever "safety" measures when
embedding the HTML in an authenticated environment. If the HTML was at a
different resource, then iframe would work well.

I don't like the "starts with <" at all in general, but of course for any
specific property you are free to add such requirements and in fact making
a little hybrid mini-format.

This is similar to how the very loosely defined dc:creator can be used with
a single literal value like "Stian Soiland-Reyes, Robert Sanderson" - and
also  the most useless literal "Soiland-Reyes, S., Sanderson, R." (some
dinosaur journals still love this format for some reason) - the requirement
is only to represent the creator as it could appear writtenon a single line
in a library book index card (hopefully carefully put into a rolodex).

Why are you insisting on using the same data type property to effectively
point to syntactically different kind of values? Perhaps simply one or two
sub-properties can lock down the distinction, and you can use language tags
and keep the range as rdf:Literal (or whichever it is for language tagged
literals, I always forget ; ) .
On 14 Aug 2014 18:03, "Robert Sanderson" <azaroth42@gmail.com> wrote:

>
> Stian: You could argue that, and you might technically be correct, but I
> think there's a lot of people who would like to say that a web page is in a
> particular language :)
>
>
> The options we're considering:
>
> 1.  Drop @type,  keep @language, and require in the specs rather than the
> RDF that if an @value starts with < and ends with > then it MUST be [X]HTML.
>
> 2. Drop @language, keep @type, and put language in the HTML using
> xml:lang, as per Gregg and what we did with annotations in EPUB:
> http://www.idpf.org/epub/oa/#h.fbvcg1ft34rp
>
> 3. Use ContentAsText as per Stian (and Markus with a little tweaking) when
> we need HTML and literals when we don't.
>
>
> Given that the majority of use cases revolve around multiple languages in
> a UI, the current resolution we have is option 1.  The reasoning:
>
> * @language is unlikely to repeat in a list of values, whereas all values
> are likely to be in either HTML or plaintext but less likely to be mixed.
> This makes @language more useful as a discriminator for which value to use.
> Otherwise, you have to parse the language out of the XML for all values
> just to throw all but one away. That's very inefficient.
>
> * Keeping @language is internally consistent with other uses of literals.
> We only type literals in the context, not in the recommended serialization.
> Especially as HTML is only usable in a limited number of fields compared to
> regular literals, there would be a lot of special cases to have to deal
> with.
>
> * Allowing both nodes and literals is messy for the range of the
> properties, and very inconsistent as to what the clients need to process.
>  Requiring just ContentAsText is really klunky in this situation (compared
> to the body of an Annotation, for example, when it makes sense)
>
> * A browser can just throw the content into HTML and not care whether it's
> a literal or an HTML snippet. It'll come out as expected.
>
> * The check for [X]HTML could be as simple as value[0] == '< and value[-1]
> == '>', with the very edgey edge case of a non HTML literal requiring an
> extra space at the end of the value.
>
> Rob
>
>
>
> On Thu, Aug 14, 2014 at 12:51 AM, Stian Soiland-Reyes <
> soiland-reyes@cs.manchester.ac.uk> wrote:
>
>> One could argue that as soon as you have used a different datatype it is
>> no longer text in that language. English language does not have <p> as one
>> of it's constructs.
>>
>> I would probably have used Content-in-RDF for that use case. XML literals
>> in RDF are fragile and a relic of the RDF/XML days.
>> On 14 Aug 2014 01:07, "Gregg Kellogg" <gregg@greggkellogg.net> wrote:
>>
>>> On Aug 13, 2014, at 4:31 PM, Robert Sanderson <azaroth42@gmail.com>
>>> wrote:
>>>
>>>
>>> Dear all,
>>>
>>> We have a use case that would require all three of @value, @type and
>>> @language for a single resource, which is not allowed according to the
>>> specification (eg section 8.3)
>>>
>>> We would like to use either plain literals (and hence @value/@language)
>>> or X/HTML in the same space to allow basic styling and linking within the
>>> text.  We want to do this in a way that doesn't involve introspection of
>>> the value to determine whether it's text/plain or text/xml if at all
>>> possible.
>>>
>>> For example:
>>>
>>>
>>> {
>>>   "description": {
>>>     "@value":"<p>Some <b>description</b></p>",
>>>     "@type": "rdf:XMLLiteral",
>>>     "@language" : "en-latn"
>>>   }
>>> }
>>>
>>>
>>> Is there any existing best practice for how to accommodate this?
>>>
>>>
>>> Note that the RDF data model allows literals to have either a datatype
>>> or a language, but not both. JSON-LD is just being consistent here.
>>>
>>> In most applications (e.g., RDFa markup), the language is included in
>>> the markup:
>>>
>>> {
>>>   "description": {
>>>     "@value":"<p lang="en-latn">Some <b>description</b></p>",
>>>     "@type": "rdf:XMLLiteral"
>>>   }
>>> }
>>>
>>>
>>> Of course, it could be that you'd like to use @container=language, to
>>> index into different markup, but as you see, this isn't supported either in
>>> RDF or JSON-LD.
>>>
>>> Gregg
>>>
>>> Thanks!
>>>
>>> Rob
>>>
>>> --
>>> Rob Sanderson
>>> Technology Collaboration Facilitator
>>> Digital Library Systems and Services
>>> Stanford, CA 94305
>>>
>>>
>>>
>
>
> --
> Rob Sanderson
> Technology Collaboration Facilitator
> Digital Library Systems and Services
> Stanford, CA 94305
>
Received on Wednesday, 3 September 2014 21:39:55 UTC