Re: @value/@type/@language combination from Gregg Kellogg on 2014-08-14 (public-linked-json@w3.org from August 2014)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Thu, 14 Aug 2014 10:40:57 -0700
To: Robert Sanderson <azaroth42@gmail.com>
Cc: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>, Linked JSON <public-linked-json@w3.org>
Message-Id: <3AC8D8EA-8C61-4C6D-B2AD-E0922CC0AEB9@greggkellogg.net>
On Aug 14, 2014, at 9:03 AM, Robert Sanderson <azaroth42@gmail.com> wrote:

> 
> Stian: You could argue that, and you might technically be correct, but I think there's a lot of people who would like to say that a web page is in a particular language :)  
> 
> 
> The options we're considering:
> 
> 1.  Drop @type,  keep @language, and require in the specs rather than the RDF that if an @value starts with < and ends with > then it MUST be [X]HTML.
> 
> 2. Drop @language, keep @type, and put language in the HTML using xml:lang, as per Gregg and what we did with annotations in EPUB: http://www.idpf.org/epub/oa/#h.fbvcg1ft34rp
> 
> 3. Use ContentAsText as per Stian (and Markus with a little tweaking) when we need HTML and literals when we don't.

For JSON-LD, another option would be to use Data Indexing [1]. This would allow you to do something like the following [2]:

{
  "@context": {
    "description": {"@id": "dc:description", "@type": "rdf:XMLLiteral", "@container": "@index"}
  },
  "description": {
    "en-latn": "<p lang="en-latn">Some <b>description</b></p>"
  }
}

The "en-latn" is simply an index, and doesn't get reflected in an RDF serialization, but it does provide you with a way of easily finding appropriate information within JSON-LD. (You might also consider using rdf:HTML instead of rdf:XMLLiteral).

Gregg

[1] http://www.w3.org/TR/json-ld/#data-indexing
[2] http://tinyurl.com/pwxxl6m

> Given that the majority of use cases revolve around multiple languages in a UI, the current resolution we have is option 1.  The reasoning:
> 
> * @language is unlikely to repeat in a list of values, whereas all values are likely to be in either HTML or plaintext but less likely to be mixed. This makes @language more useful as a discriminator for which value to use. Otherwise, you have to parse the language out of the XML for all values just to throw all but one away. That's very inefficient. 
> 
> * Keeping @language is internally consistent with other uses of literals. We only type literals in the context, not in the recommended serialization. Especially as HTML is only usable in a limited number of fields compared to regular literals, there would be a lot of special cases to have to deal with.
> 
> * Allowing both nodes and literals is messy for the range of the properties, and very inconsistent as to what the clients need to process.  Requiring just ContentAsText is really klunky in this situation (compared to the body of an Annotation, for example, when it makes sense)
> 
> * A browser can just throw the content into HTML and not care whether it's a literal or an HTML snippet. It'll come out as expected.
> 
> * The check for [X]HTML could be as simple as value[0] == '< and value[-1] == '>', with the very edgey edge case of a non HTML literal requiring an extra space at the end of the value.
> 
> Rob
> 
> 
> 
> On Thu, Aug 14, 2014 at 12:51 AM, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk> wrote:
> One could argue that as soon as you have used a different datatype it is no longer text in that language. English language does not have <p> as one of it's constructs.
> 
> I would probably have used Content-in-RDF for that use case. XML literals in RDF are fragile and a relic of the RDF/XML days.
> 
> On 14 Aug 2014 01:07, "Gregg Kellogg" <gregg@greggkellogg.net> wrote:
> On Aug 13, 2014, at 4:31 PM, Robert Sanderson <azaroth42@gmail.com> wrote:
> 
>> 
>> Dear all,
>> 
>> We have a use case that would require all three of @value, @type and @language for a single resource, which is not allowed according to the specification (eg section 8.3)
>> 
>> We would like to use either plain literals (and hence @value/@language) or X/HTML in the same space to allow basic styling and linking within the text.  We want to do this in a way that doesn't involve introspection of the value to determine whether it's text/plain or text/xml if at all possible.
>> 
>> For example:
>> 
>> {
>>   "description": {
>>     "@value":"<p>Some <b>description</b></p>",
>>     "@type": "rdf:XMLLiteral",
>>     "@language" : "en-latn"
>>   }
>> }
>> 
>> Is there any existing best practice for how to accommodate this?
> 
> Note that the RDF data model allows literals to have either a datatype or a language, but not both. JSON-LD is just being consistent here.
> 
> In most applications (e.g., RDFa markup), the language is included in the markup:
> 
> {
>   "description": {
>     "@value":"<p lang="en-latn">Some <b>description</b></p>",
>     "@type": "rdf:XMLLiteral"
>   }
> }
> 
> Of course, it could be that you'd like to use @container=language, to index into different markup, but as you see, this isn't supported either in RDF or JSON-LD.
> 
> Gregg
> 
>> Thanks!
>> 
>> Rob
>> 
>> -- 
>> Rob Sanderson
>> Technology Collaboration Facilitator
>> Digital Library Systems and Services
>> Stanford, CA 94305
> 
> 
> 
> 
> -- 
> Rob Sanderson
> Technology Collaboration Facilitator
> Digital Library Systems and Services
> Stanford, CA 94305
Received on Thursday, 14 August 2014 17:41:29 UTC