Re: Language maps and undefined language from Robert Sanderson on 2017-04-12 (public-linked-json@w3.org from April 2017)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Tue, 11 Apr 2017 17:37:00 -0700
To: Jakob Voß <jakob.voss@gbv.de>
Cc: Linked JSON <public-linked-json@w3.org>
Message-ID: <CABevsUG5cP3Caue51pqzASEZHtJqyGuz-_Jju-B+4Bg3heGpgg@mail.gmail.com>
Please consider the I18n group at the W3C on the topic:

   https://www.w3.org/International/questions/qa-no-language

To excerpt the document:

> Use the subtag zxx when the text is *known to be* not in any language.

> [...] use xml:lang="" <http://www.w3.org/TR/REC-xml/#sec-lang-tag>,
otherwise use xml:lang="und". These values indicate that we cannot
determine, for one reason or another, what the appropriate language
information is, or whether the text is non-linguistic.

Note that we cannot use "", as noted, because PHP does not support empty
string as the key of a dictionary... and thus we fallback to using "und".

Rob


On Tue, Apr 11, 2017 at 5:28 PM, Robert Sanderson <azaroth42@gmail.com>
wrote:

>
> The use case is when you have data from multiple sources, some with
> language tags and some without. When you aggregate the triples at the
> moment, you get garbage in the JSON-LD representation. The "perfection or
> nothing" approach proposed seems to be against the spirit of JSON-LD's
> "make it work for the developer" ethos.
>
> I prefer UND compared to ZXX because there is likely to be linguistic
> content, it's just that we don't know which language (if any) it's in.
>  "Undetermined" seems to include the possibility of no language, whereas
> ZXX seems more explicitly not linguistic, and MIS/MUL are explicitly
> linguistic.  I would say that the vast majority of the time, legacy data
> does not have per-string language associations ... and thus the case of "we
> just don't know but we think it's linguistic" is also (thus) the vast
> majority of the cases.
>
> Rob
>
>
> On Mon, Apr 10, 2017 at 11:13 PM, Jakob Voß <jakob.voss@gbv.de> wrote:
>
>> Hi,
>>
>> Gregg Kellogg wrote:
>>
>> > In CSVW, we coined “und” as the undefined/absent language.
>>
>> "und" is a perfectly legal language tag, defined in the IANA language
>> tag registry:
>>
>> Type: language
>> Subtag: und
>> Description: Undetermined
>> Added: 2005-10-16
>> Scope: special
>>
>> The other language tags in the "special" Scope are:
>>
>> zxx: No linguistic content/Not applicable
>> mis: Uncoded languages
>> mul: Multiple languages
>>
>> One might argue that "zxx" is actually equivalent to no language tag.
>> Anyway "und" is actually used for "unknown language" in contrast to "no
>> language". If your data
>> model expects strings to always have languages "und" makes sense but in
>> this case there should not be literal strings without language tag
>> anyway (see JSKOS json-ld profile for SKOS for an example).
>>
>> Robert wrote:
>>
>> > If compaction would result in an attempt to add a string without an
>> > associated language into a LanguageMap, then the processor SHOULD
>> > assign the undefined language code `UND` as the key in the array.
>>
>> I'd prefer this:
>>
>> If compaction would result in an attempt to add a string without an
>> associated language into a LanguageMap, then the processor MUST NOT
>> include this string. Instead it SHOULD emit a warning to inform that the
>> data to compact does not fit to the expected data model expressed
>> by definition of a LanguageMap.
>>
>> In theory, any kind of RDF data should be expressible with any kind of
>> JSON-LD context. In practice each JSON-LD context defines a data model
>> with implicit or explicit assumptions what RDF data to be expressible in
>> a meaningful way. I prefer meaningful data over hacks to express data
>> that does not conform to expectations anyway.
>>
>> What's the actual use case of having non-language strings in language
>> maps?
>>
>> Jakob
>>
>>
>
>
> --
> Rob Sanderson
> Semantic Architect
> The Getty Trust
> Los Angeles, CA 90049
>



-- 
Rob Sanderson
Semantic Architect
The Getty Trust
Los Angeles, CA 90049
Received on Wednesday, 12 April 2017 00:37:33 UTC