On "Solicit input from xliff, its, bpmlod about json/plain text i18n metadata" from Felix Sasaki on 2015-10-21 (public-i18n-core@w3.org from October to December 2015)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 21 Oct 2015 21:49:10 +0200
To: public-i18n-core@w3.org
Message-Id: <BDB342E0-FD32-4122-826C-B672072C635A@w3.org>

See

As said last week, I won’t be on the call this week. Below is my input to
http://www.w3.org/International/track/actions/478

I contacted the ITS IG, the XLIFF TC and the BPMLOD group. I did not hear back from BPMLOD. I assume this is also because in the linguistic linked data area people don’t care much about rendering but focus on querying the data structures. Having no i18n metadata available is not obstacle for doing that.

People pointed out the practice of putting fragments of XML or HTML into JSON strings. This sounds to me like the JSON version of RDF XML literals:

"dc:title": {    
"@type": "rdf:XMLLiteral",
"@value": "\n       <span xml:lang=\"en\">\n         The <em>&lt;br /&gt;</em> Element Considered Harmful.\n       </span>\n     "
}

This looks horrible, but it is actually valid JSON-LD - generated from http://www.w3.org/TR/2004/REC-rdf-primer-20040210/#xmlliterals 

The json-ld inherit mechanism for language information  
http://www.w3.org/TR/json-ld/#string-internationalization
does not solve the plain text problem.

There was some advice against XML features like above or type systems (which may eventually lead to the above).

In json-ld you can specify information per a group of json-ld data structures, cf. the json-ld context. So providing defaults like via the json-ld language information mechanism can solve at least a part of the problem.

It was pointed out that often json structurer are not edited manually. In these cases it would be safe to add metadata (i18n or others) via pointers to the actual content, e.g. (example taken from Fredrik Esteem from the XLIFF TC):

{
  "headline" : "Today Acme Inc is prod to announce that John Doe has joined the company as Vice President of Infinite Improbability drive development.",
  "annotations" : [
    { "type" : "date", "subtype" : "relative-issue", "ref" : "2015-10-18T14:19:54+00:00", "span" : [0,4] },
    { "type" : "name", "subtype" : "company", "ref" : "http://www.test.org", "span" : [6,13] },
    { "type" : "name", "subtype" : "person", "ref" : "http://www.allpeople.biz/Doe/John/123", "span" : [40,47] },
    { "type" : "title", "subtype" : "business", "ref" : "https://en.wikipedia.org/wiki/Vice_president#In_business", "span" : [75,88] },
    { "type" : "name", "subtype" : "product", "ref" : null, "span" : [93,120] }
  ]
}

In summary: I think compared to what we have at
https://www.w3.org/International/wiki/ContentMetadataJavaScriptDiscussion
http://r12a.github.io/docs/bidi-plain-text/index.html 
the pointer mechanism is something new. And it is a very much simplified version of the annotation model spec. Of course this approach does not differentiate between i18n and other types of metadata.

Best,

Felix

Received on Wednesday, 21 October 2015 19:49:19 UTC