W3C home > Mailing lists > Public > public-esw-thes@w3.org > June 2011

Re: skos:prefLabel without language tag

From: Jon Phipps <jphipps@madcreek.com>
Date: Tue, 28 Jun 2011 06:58:20 -0500
Message-ID: <BANLkTimTLz23Vn0abp0ANbLcrNW-2ZhaJw@mail.gmail.com>
To: Antoine Isaac <aisaac@few.vu.nl>
Cc: public-esw-thes@w3.org
Hi Antoine,

The Metadata Registry requires a language attribute on labels. What we
haven't decided is what we should do when aggregating data from other
'systems' that don't take a preemptive approach to conformance. I recognize
the difference between:
ex:foo skos:prefLabel 'color';
ex:foo skos:prefLabel 'color'@en;
ex:foo skos:prefLabel 'color'@en-US;
...but would strongly prefer that a system recognize that these represent
successive refinements of a single label for the purpose of testing
conformance, although I wonder how often that when evaluating data in the
aggregate such conformance tests are likely to be useful.

OTOH, we do have to wonder what we would return if a REST request asked for
a representation of a concept in language 'en' from an aggregated store. But
I think that's another discussion entirely.

I think a dedicated page on the SKOS wiki might be very useful in the light
of this discussion, perhaps once there's some response from rdf-wg?


On Mon, Jun 27, 2011 at 11:12 AM, Antoine Isaac <aisaac@few.vu.nl> wrote:

> Hi,
> Yes, the issue seems to be really tricky, between what happens at the
> syntax level, and what happens at the model level. Personally I'll wait till
> I receive some feedback from the RDF group--my email is still blocked btw.
> Now, the issue of whether we should recommend or require language tags, as
> Jon, Bernard and Armando suggest (in different fashions), remains. Again my
> take is to go for softer, best practice-kind of encouragement: in
> particular, sending messages related to the notion of "conformance" seems
> quite strong [1].
> If instead available SKOS reference tooling (just thinking of PoolParty's
> or Mondeca's tools) include a warning, when tag-less labels are found, this
> would help sending an appropriate message, I think! Hmm, maybe you've
> already implemented it... And maybe there's something in the Metadata
> Registry that entices the user to assign a languagetag as well?
> One may think of adding some sentences to the specification document,
> making soft recommendations. But changing published W3C recs is nearly
> impossible, and errata are usually used for "bugs" in the specs, only, like
> the potential one for S14. Plus, the docs are already making some effort in
> that direction: the SKOS Primer has only one example of tag-less literal,
> and it's in a very specific context (notations)...
> If we want really to centralize best practice, then anyone is free to use
> the SKOS wiki and create a dedicated page, gathering all these softer
> "warnings" a reasoner could issue when ingesting SKOS data. If it gets
> decent consensus (and stability), we could even easily port it to
> http://www.w3.org/2004/02/**skos/ <http://www.w3.org/2004/02/skos/>, under
> a "best practices" heading: that site is not a formal W3C recommendation
> document.
> Cheers,
> Antoine
> [1] http://www.w3.org/TR/2009/REC-**skos-reference-20090818/#L434<http://www.w3.org/TR/2009/REC-skos-reference-20090818/#L434>
>> On Mon, Jun 27, 2011 at 8:04 AM, Jon Phipps <jphipps@madcreek.com<mailto:
>> jphipps@madcreek.com>> wrote:
>>    Hi Antoine,
>>    +1, I think, sortof, maybe. :-)
>>    It depends a bit on what you're saying.
>>    If we take the Open World assumption of the RDF data model into
>> consideration, then it would seem reasonable to state to a _reasoner_ that a
>> skos:prefLabel _must_ have a language tag, particularly given the intent of
>> [S14], even if that language tag is currently unknown. Using Bernard's
>> excellent example, this would imply to me at least that the 'conformance' of
>> the following can't be determined without more information:
>>    ex:foo skos:prefLabel 'A'; prefLabel 'B'@en
>>    And that the following isn't redundant, but rather supplies that
>> information:
>>    ex:foo skos:prefLabel 'A'; prefLabel 'A'@en
>> Unfortunately this isn't the case. There is no syntax for partially
>> specifying these data values. So the model theory has these as two labels.
>>    I think this is a somewhat separate issue from the one that you raised
>> with the RDF folks:
>>    _If_ the specification _requires_ a language tag in order to determine
>> conformance with [S14], does this:
>>    ex:foo skos:prefLabel 'A'
>>    infer this:
>>    ex:foo skos:prefLabel 'A'@""
>> As I pointed out, the latter isn't valid, as the language tag needs to be
>> one specified in BCP47.
>> <foo xml:lang="">bar</foo> does not mean the value of the foo element is
>> 'bar'@"". I means the value of the foo element is 'bar' (without language
>> tag).
>> Realize that the parsing of the syntax does not translate into what you
>> think would be the obvious translation. Perhaps recognizing that the
>> transformation parsetype attribute does a non obvious transformation will
>> help help emphasize that care should be made in understanding the difference
>> between what you see in a particular concrete syntax versus was is read into
>> the model.
>> -Alan
>>    If that is the case, that would transform this:
>>    ex:foo skos:prefLabel 'A'; prefLabel 'B'@en
>>    into this:
>>    ex:foo skos:prefLabel 'A'@""; prefLabel 'B'@en
>>    which is conformant with [S14], and this:
>>    ex:foo skos:prefLabel 'A'@""; prefLabel 'A'@en
>>    which is not conformant, _unless_ you consider that
>>    ex:foo skos:prefLabel 'A'@en
>>    is a higher-value, more refined replacement for
>>    ex:foo skos:prefLabel 'A'@""
>>    Bernard's refinement of the rule would seem to be an
>> application-specific case, even though I think that rule of interpreting an
>> empty language tag to mean 'all' or 'any' language rather than 'no language'
>> is highly useful best practice. His rule has value in determining which
>> labels to display or which concepts to return from a search, but this is
>> slightly different than discussing conformance to [S14].
>>    I hope you get an answer from the rdf-wg, but I agree with you that
>> what constitutes 'acceptable' data, especially when aggregating data from
>> disparate systems should be broadly defined even if that is somewhat
>> different than what defines 'conformance'. Postel's robust principle: "be
>> liberal in what you accept; be conservative in what you send" provide's
>> useful guidance.
>>    Jon
>>    On Fri, Jun 24, 2011 at 7:07 AM, Antoine Isaac <aisaac@few.vu.nl<mailto:
>> aisaac@few.vu.nl>> wrote:
>>        Hi Armando, Bernard,
>>        SKOS indeed encourages the use of language-tagged labels. This is
>> why almost all examples in the doc have language tags, and probably the
>> reason for which we now have to make S14 clearer--cf. our other discussion
>> now.
>>        But we also have to remain simple, and compatible with a wide range
>> of data. For many vocabularies, publishing language info is technically
>> difficult, or even impossible. This is especially the case for vocabularies
>> that have been aggregating labels originating from different languages, but
>> with data structures that do not allow (or make difficult) to track language
>> provenance.
>>        Cheers,
>>        Antoine
>>            Hi all,
>>            agree with Bernard.
>>            Even more, for how much it can seem restrictive (and possibly
>> causing huge panic for retrocompatibility with huge amount of existing data,
>> but every revolution has its heads chopped off…), I would think of a
>> revision of SKOS as **really** suggesting not to use (forbidding?)
>> prefLabels with no language tag. One of the SKOS objectives was to give a
>> decent coverage of the linguistic descriptions of concept schemes (and
>> ontologies in general, as prefLabel is now an AnnotationProperty [S10] thus
>> admitting any resource in its domain), and thus a prefLabel with no language
>> tag makes no sense to me. One could say that plainLiterals could be used
>> with no langtag to address specific codes related to no natural language,
>> but there are better options for that (i.e. skos:notation).
>>            In my experience, I’ve always had to make-do somehow with
>> missing lang tags, because usually those values still are explained in some
>> language, so you have to know it in advance, or guess it…so, lot of patches
>> to any software ever written for natural language querying over ontologies,
>> to account for the language assumed to be used for no-langtagged-literals.
>> Collapsing indexes for no-lang-tags with lang-tags of the same language etc…
>>            This is a dirty work to be done when dealing with rdfs:label,
>> but an highly specified (and specific) property as prefLabel could surely
>> better live without “no-lang-tagged” plainLiterals.
>>            Armando
>>            *From:* public-esw-thes-request@w3.org <mailto:
>> public-esw-thes-**request@w3.org <public-esw-thes-request@w3.org>>
>> [mailto:public-esw-thes-__**request@w3.org<public-esw-thes-__request@w3.org><mailto:
>> public-esw-thes-**request@w3.org <public-esw-thes-request@w3.org>>] *On
>> Behalf Of *Bernard Vatant
>>            *Sent:* Friday, June 24, 2011 11:12 AM
>>            *To:* Antoine Isaac
>>            *Cc:* public-esw-thes@w3.org <mailto:public-esw-thes@w3.org**>
>>            *Subject:* Re: skos:prefLabel without language tag
>>            Hello all
>>            Thinking further about it, beyond the formal issue we have the
>> question of the expected behaviour of applications when meeting labels w/o
>> language tags.
>>            In multilingual environments, the language tag is typically
>> used to present the concept to end users in their "user language". The
>> unicity of the prefLabel in the user language avoids clashes in the
>> interface. Note that some systems (e.g., Eurovoc and other OPOCE
>> vocabularies) even require that all concepts have a prefLabel in all
>> supported user languages (e.g., EU official languages), including default
>> value rules (such as take the English label if no label is available in
>> Slovenian or Swedish).
>>            In our (Mondeca ITM) system, a label (aka "name") has also a
>> mandatory and unique language tag, but one possible value is "no language".
>> The behaviour of the system regarding this tag is that such names are
>> displayed whatever the user language choice. Of course if one wants unicity
>> of the displayed name, it implies that if there is a "no language" name,
>> there is no (other) name tagged with a language.
>>            Translated in SKOS, this rule would look like :
>>            *If a Concept has a prefLabel value with no language tag, it
>> cannot have a different prefLabel value with a language tag.*
>>            IOW the following is not conformant
>>            ex:foo skos:prefLabel 'A'; prefLabel 'B'@en
>>            The following is conformant but somehow redundant
>>            ex:foo skos:prefLabel 'A'; prefLabel 'A'@en
>>            Bernard
>>            2011/6/23 Antoine Isaac <aisaac@few.vu.nl <mailto:
>> aisaac@few.vu.nl> <mailto:aisaac@few.vu.nl <mailto:aisaac@few.vu.nl>>>
>>            On 6/23/11 8:40 PM, Alan Ruttenberg wrote:
>>            On Thu, Jun 23, 2011 at 1:52 PM, Houghton,Andrew<houghtoa@oclc.
>> **__org <mailto:houghtoa@oclc.org> <mailto:houghtoa@oclc.org <mailto:
>> houghtoa@oclc.org>>> wrote:
>>            Given these two situations:
>>            <skos:prefLabel>Dog</skos:__**prefLabel>
>>            <skos:prefLabel xml:lang=””>Dog</skos:__**prefLabel>
>>            Does the inclusion of *both* prefLabel in a SKOS concept result
>> in breaking
>>            the rule S14 that no two prefLabel should have the same lexical
>> value for
>>            the same language tag?
>>            My read is that S14 is not applicable. In both cases the
>> lexical value
>>            is the same - a plain literal without language tag. The RDFXML
>> doesn't
>>            state that the language tag is "". It is syntax for the absence
>> of a
>>            language tag. These two are different in the value space -
>> without a
>>            language tag it is a string, with a language tag it is a pair
>> of
>>            strings. The set of plain literals without language tags is
>> *not* the
>>            set of pairs (string , "").
>>            Since the rule as stated applies to literals *with* language
>> tags
>>            (they can't be the same unless they are there), S14 would not
>> seem to
>>            be applicable.
>>            That said, this looks like a hole in the spec. It was probably
>> the
>>            intention to also include the case that no two prefLabel
>> without
>>            language tag have the same lexical value.
>>            -Alan
>>            Yes, it certainly was.
>>            I have to admit I don't know if there is a hole. It may seem
>> reasonable that there exist some syntactic matching between literals having
>> an empty tag and literals having no tag, as Simon reports.
>>            I think section 6.12 of the rdf syntax spec does result in the
>> defaulting of language to at least "" in production 7.2.16- there doesn't
>> seem to be another literal production that passes the language feature. I
>> must admit that I am not certain how general this assumption is- there are
>> other specs that seem to distinguish between <s> and <s,l>, but I think only
>> <s> \equiv <s,""> is consistent?
>>            Simon
>>            However, this may be specific to one syntax.
>>            The RDF abstract syntax and other specs are not mentioning that
>> sort of things. Especially, the way the identity conditions are spelled out
>> at [1,2] seem to argue against amalgamating absence of tag with presence of
>> any tag (including an empty one).
>>            Anyway, it could be that the simplest thing to do is to publish
>> an erratum to clarify the original intent, rather than go into a discussion
>> that is difficult, and would perhaps just be against a moving target, as RDF
>> is currently being worked on... I'll forward the issue.
>>            Cheers,
>>            Antoine
>>            [1]http://www.w3.org/TR/rdf-__**concepts/#section-Literal-__**
>> Equality<http://www.w3.org/TR/rdf-__concepts/#section-Literal-__Equality><
>> http://www.w3.org/TR/rdf-**concepts/#section-Literal-**Equality<http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality>
>> >
>>            [2] http://www.w3.org/TR/rdf-__**
>> plain-literal/#The_Comparison_**__of_rdf:PlainLiteral_Data___**Values<http://www.w3.org/TR/rdf-__plain-literal/#The_Comparison___of_rdf:PlainLiteral_Data___Values><
>> http://www.w3.org/TR/rdf-**plain-literal/#The_Comparison_**
>> of_rdf:PlainLiteral_Data_**Values<http://www.w3.org/TR/rdf-plain-literal/#The_Comparison_of_rdf:PlainLiteral_Data_Values>
>> >
>>            --
>>            Bernard Vatant
>>            Senior Consultant
>>            Vocabulary & Data Integration
>>            Tel: +33 (0) 971 488 459 <tel:%2B33%20%280%29%20971%**
>> 20488%20459>
>>            Mail: bernard.vatant@mondeca.com <mailto:bernard.vatant@**
>> mondeca.com <bernard.vatant@mondeca.com>> <mailto:bernard.vatant@__monde*
>> *ca.com <http://mondeca.com> <mailto:bernard.vatant@**mondeca.com<bernard.vatant@mondeca.com>
>> >>
>>            ------------------------------**__----------------------
>>            Mondeca
>>            3, cité Nollez 75018 Paris France
>>            Web: http://www.mondeca.com
>>            Blog: http://mondeca.wordpress.com
>>            ------------------------------**__----------------------
>>    --
>>    Jon
>>    I check email just a couple of times daily; to reach me sooner, click
>> here: http://awayfind.com/jonphipps


I check email just a couple of times daily; to reach me sooner, click here:
Received on Tuesday, 28 June 2011 12:04:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 28 June 2011 12:04:33 GMT