- From: Robert Sanderson <azaroth42@gmail.com>
- Date: Tue, 11 Apr 2017 17:37:00 -0700
- To: Jakob Voß <jakob.voss@gbv.de>
- Cc: Linked JSON <public-linked-json@w3.org>
- Message-ID: <CABevsUG5cP3Caue51pqzASEZHtJqyGuz-_Jju-B+4Bg3heGpgg@mail.gmail.com>
Please consider the I18n group at the W3C on the topic: https://www.w3.org/International/questions/qa-no-language To excerpt the document: > Use the subtag zxx when the text is *known to be* not in any language. > [...] use xml:lang="" <http://www.w3.org/TR/REC-xml/#sec-lang-tag>, otherwise use xml:lang="und". These values indicate that we cannot determine, for one reason or another, what the appropriate language information is, or whether the text is non-linguistic. Note that we cannot use "", as noted, because PHP does not support empty string as the key of a dictionary... and thus we fallback to using "und". Rob On Tue, Apr 11, 2017 at 5:28 PM, Robert Sanderson <azaroth42@gmail.com> wrote: > > The use case is when you have data from multiple sources, some with > language tags and some without. When you aggregate the triples at the > moment, you get garbage in the JSON-LD representation. The "perfection or > nothing" approach proposed seems to be against the spirit of JSON-LD's > "make it work for the developer" ethos. > > I prefer UND compared to ZXX because there is likely to be linguistic > content, it's just that we don't know which language (if any) it's in. > "Undetermined" seems to include the possibility of no language, whereas > ZXX seems more explicitly not linguistic, and MIS/MUL are explicitly > linguistic. I would say that the vast majority of the time, legacy data > does not have per-string language associations ... and thus the case of "we > just don't know but we think it's linguistic" is also (thus) the vast > majority of the cases. > > Rob > > > On Mon, Apr 10, 2017 at 11:13 PM, Jakob Voß <jakob.voss@gbv.de> wrote: > >> Hi, >> >> Gregg Kellogg wrote: >> >> > In CSVW, we coined “und” as the undefined/absent language. >> >> "und" is a perfectly legal language tag, defined in the IANA language >> tag registry: >> >> Type: language >> Subtag: und >> Description: Undetermined >> Added: 2005-10-16 >> Scope: special >> >> The other language tags in the "special" Scope are: >> >> zxx: No linguistic content/Not applicable >> mis: Uncoded languages >> mul: Multiple languages >> >> One might argue that "zxx" is actually equivalent to no language tag. >> Anyway "und" is actually used for "unknown language" in contrast to "no >> language". If your data >> model expects strings to always have languages "und" makes sense but in >> this case there should not be literal strings without language tag >> anyway (see JSKOS json-ld profile for SKOS for an example). >> >> Robert wrote: >> >> > If compaction would result in an attempt to add a string without an >> > associated language into a LanguageMap, then the processor SHOULD >> > assign the undefined language code `UND` as the key in the array. >> >> I'd prefer this: >> >> If compaction would result in an attempt to add a string without an >> associated language into a LanguageMap, then the processor MUST NOT >> include this string. Instead it SHOULD emit a warning to inform that the >> data to compact does not fit to the expected data model expressed >> by definition of a LanguageMap. >> >> In theory, any kind of RDF data should be expressible with any kind of >> JSON-LD context. In practice each JSON-LD context defines a data model >> with implicit or explicit assumptions what RDF data to be expressible in >> a meaningful way. I prefer meaningful data over hacks to express data >> that does not conform to expectations anyway. >> >> What's the actual use case of having non-language strings in language >> maps? >> >> Jakob >> >> > > > -- > Rob Sanderson > Semantic Architect > The Getty Trust > Los Angeles, CA 90049 > -- Rob Sanderson Semantic Architect The Getty Trust Los Angeles, CA 90049
Received on Wednesday, 12 April 2017 00:37:33 UTC