Re: Language maps and undefined language

The use case is when you have data from multiple sources, some with
language tags and some without. When you aggregate the triples at the
moment, you get garbage in the JSON-LD representation. The "perfection or
nothing" approach proposed seems to be against the spirit of JSON-LD's
"make it work for the developer" ethos.

I prefer UND compared to ZXX because there is likely to be linguistic
content, it's just that we don't know which language (if any) it's in.
 "Undetermined" seems to include the possibility of no language, whereas
ZXX seems more explicitly not linguistic, and MIS/MUL are explicitly
linguistic.  I would say that the vast majority of the time, legacy data
does not have per-string language associations ... and thus the case of "we
just don't know but we think it's linguistic" is also (thus) the vast
majority of the cases.

Rob


On Mon, Apr 10, 2017 at 11:13 PM, Jakob Voß <jakob.voss@gbv.de> wrote:

> Hi,
>
> Gregg Kellogg wrote:
>
> > In CSVW, we coined “und” as the undefined/absent language.
>
> "und" is a perfectly legal language tag, defined in the IANA language
> tag registry:
>
> Type: language
> Subtag: und
> Description: Undetermined
> Added: 2005-10-16
> Scope: special
>
> The other language tags in the "special" Scope are:
>
> zxx: No linguistic content/Not applicable
> mis: Uncoded languages
> mul: Multiple languages
>
> One might argue that "zxx" is actually equivalent to no language tag.
> Anyway "und" is actually used for "unknown language" in contrast to "no
> language". If your data
> model expects strings to always have languages "und" makes sense but in
> this case there should not be literal strings without language tag
> anyway (see JSKOS json-ld profile for SKOS for an example).
>
> Robert wrote:
>
> > If compaction would result in an attempt to add a string without an
> > associated language into a LanguageMap, then the processor SHOULD
> > assign the undefined language code `UND` as the key in the array.
>
> I'd prefer this:
>
> If compaction would result in an attempt to add a string without an
> associated language into a LanguageMap, then the processor MUST NOT
> include this string. Instead it SHOULD emit a warning to inform that the
> data to compact does not fit to the expected data model expressed
> by definition of a LanguageMap.
>
> In theory, any kind of RDF data should be expressible with any kind of
> JSON-LD context. In practice each JSON-LD context defines a data model
> with implicit or explicit assumptions what RDF data to be expressible in
> a meaningful way. I prefer meaningful data over hacks to express data
> that does not conform to expectations anyway.
>
> What's the actual use case of having non-language strings in language maps?
>
> Jakob
>
>


-- 
Rob Sanderson
Semantic Architect
The Getty Trust
Los Angeles, CA 90049

Received on Wednesday, 12 April 2017 00:28:43 UTC