- From: Robert Sanderson <azaroth42@gmail.com>
- Date: Tue, 11 Apr 2017 17:28:09 -0700
- To: Jakob Voß <jakob.voss@gbv.de>
- Cc: Linked JSON <public-linked-json@w3.org>
- Message-ID: <CABevsUGWvsPh4uCLGBxebPM2eBqBw=_Q-zoDHPs4uzGGcJq_+w@mail.gmail.com>
The use case is when you have data from multiple sources, some with language tags and some without. When you aggregate the triples at the moment, you get garbage in the JSON-LD representation. The "perfection or nothing" approach proposed seems to be against the spirit of JSON-LD's "make it work for the developer" ethos. I prefer UND compared to ZXX because there is likely to be linguistic content, it's just that we don't know which language (if any) it's in. "Undetermined" seems to include the possibility of no language, whereas ZXX seems more explicitly not linguistic, and MIS/MUL are explicitly linguistic. I would say that the vast majority of the time, legacy data does not have per-string language associations ... and thus the case of "we just don't know but we think it's linguistic" is also (thus) the vast majority of the cases. Rob On Mon, Apr 10, 2017 at 11:13 PM, Jakob Voß <jakob.voss@gbv.de> wrote: > Hi, > > Gregg Kellogg wrote: > > > In CSVW, we coined “und” as the undefined/absent language. > > "und" is a perfectly legal language tag, defined in the IANA language > tag registry: > > Type: language > Subtag: und > Description: Undetermined > Added: 2005-10-16 > Scope: special > > The other language tags in the "special" Scope are: > > zxx: No linguistic content/Not applicable > mis: Uncoded languages > mul: Multiple languages > > One might argue that "zxx" is actually equivalent to no language tag. > Anyway "und" is actually used for "unknown language" in contrast to "no > language". If your data > model expects strings to always have languages "und" makes sense but in > this case there should not be literal strings without language tag > anyway (see JSKOS json-ld profile for SKOS for an example). > > Robert wrote: > > > If compaction would result in an attempt to add a string without an > > associated language into a LanguageMap, then the processor SHOULD > > assign the undefined language code `UND` as the key in the array. > > I'd prefer this: > > If compaction would result in an attempt to add a string without an > associated language into a LanguageMap, then the processor MUST NOT > include this string. Instead it SHOULD emit a warning to inform that the > data to compact does not fit to the expected data model expressed > by definition of a LanguageMap. > > In theory, any kind of RDF data should be expressible with any kind of > JSON-LD context. In practice each JSON-LD context defines a data model > with implicit or explicit assumptions what RDF data to be expressible in > a meaningful way. I prefer meaningful data over hacks to express data > that does not conform to expectations anyway. > > What's the actual use case of having non-language strings in language maps? > > Jakob > > -- Rob Sanderson Semantic Architect The Getty Trust Los Angeles, CA 90049
Received on Wednesday, 12 April 2017 00:28:43 UTC