W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2013

Re: bad language tags

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Tue, 07 May 2013 10:09:36 +0200
Message-ID: <5188B6C0.3050305@emse.fr>
To: andy.seaborne@epimorphics.com
CC: public-rdf-wg@w3.org
Le 07/05/2013 09:26, Andy Seaborne a écrit :
>
>
> On 07/05/13 04:27, Pat Hayes wrote:
>>
>> On May 6, 2013, at 2:15 PM, Antoine Zimmermann wrote:
>>
>>> Le 06/05/2013 19:47, Pat Hayes a écrit :
>>>> (I think we may have decided this already, but can't find the
>>>> decision.)
>>>>
>>>> If some RDF has a language-tagged literal with a bad language tag
>>>> (not conforming to section 2.2.9 of BPC 47), is that
>>>>
>>>> 1. an RDF syntax error
>
> What's an "RDF syntax error"?  Concrete or Abstract Syntax?

I assume that what Pat is interested in at the moment is abstract 
syntax.  For his editing of RDF Semantics, it's the only thing that matters.


> Real data may well have legal, correctly canonicalised language tags
> ... which are then not legal RDF due to case.
>
> @en-US

This is not a problem. Language tags do not have @ either. The concrete 
syntax can allow many things as long as it's clear how it maps to a 
valid RDF Graph. Reasoning on a particular concrete syntax like this 
would lead you to the conclusion that an integer does not need to have a 
datatype IRI because you can write it like that in Turtle.


>>>> 2. syntactically legal but inconsistent (because the literal has
>>>> no legal value)
>  >>> 3. legal and consistent (because even a bad
>>>> language tag is still an RDF language tag) ?
>>>
>>> RDF concepts says that a language-tagged string has a lexical form
>>> (a UNICOD string), a datatype IRI (rdf:langString) and a language
>>> tag (a non-empty language tag as defined by [BCP47]. The language
>>> tag must be well-formed according to section 2.2.9 of [BCP47], and
>>> must be normalized to lowercase).
>
> The lower case requirement is in the abstract syntax.

Right.

> Many processors implement RFC3066 as does the Turtle grammar.

It's alright. The lower case requirement is simply to defined what is 
the identity of a language tag. If you write @en-US or @en-us in Turtle, 
you are using the same language tag. It does not matter how the parser 
deals with this, as long as they compare equal.


>>>
>>> Anything else is not a language-tagged string.
>>> So, it's answer 1.
>
> By that argument "@en-US" is a syntax error yet it is the canonical form.

In the abstract syntax "@en-US" would be strongly wrong because of the @ 
character. It does not need be a syntax error in Turtle, but it's an 
error in RDF/XML or JSON-LD. One could imagine a syntax where en-US is a 
syntax error.


>
>> Well, that is how I would interpret that MUST as well, but I think
>> that it would be better if it were to say this explicitly, because
>> this being a syntax error requires all conformant RDF parsers to know
>> all about wellformedness of language tags. I actually think this is a
>> very bad decision, if it really is what the WG intended to do. Which
>> is why I wanted to make sure that the text was very clear on exactly
>> what is intended here.
>
> well-formedness of languiage tags isn't too bad - it's the grammar in
> 2.2.9 although in Turtle, RFC 3066 is used.

If any RFC3066-valid tag can be mapped in a non-ambiguous way to a 
BCP47-valid tag, then it's not a contradiction (but maybe a remark on 
this should be put somewhere in the Turtle spec).

>
> Concepts says
>
> """
> 5. Otherwise, the literal is ill-typed, and no literal value can be
> associated with the literal. Such a case, while in error, is not
> syntactically ill-formed.
> """

Language-tagged strings cannot be ill-typed since they do not have a 
lexical space, and they are interpreted in their own special way.


AZ.


>
> +1 to 3.
>
>      Andy
>
>>
>> Pat
>>
>>> There has been discussion about it, and I think this was what we
>>> came to agree on, but I don't remember if it has been reflected in
>>> a WG resolution.
>>>
>>>
>>> AZ
>>>
>>>
>>>>
>>>> Pat
>>>>
>>>> PS.  If we have to decide this, I vote for 3 as being less work
>>>> to implement, and on the grounds that RDF's job isn't to check on
>>>> bad data.
>>>>
>>>> ------------------------------------------------------------ IHMC
>>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>> (850)202 4416   office Pensacola
>>>> (850)202 4440   fax FL 32502
>>>> (850)291 0667   mobile phayesAT-SIGNihmc.us
>>>> http://www.ihmc.us/users/phayes
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>>
>>>
>>
>> ------------------------------------------------------------ IHMC
>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>> (850)202 4416   office Pensacola                            (850)202
>> 4440   fax FL 32502                              (850)291 0667
>> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>
>>
>>
>>
>>
>>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Tuesday, 7 May 2013 08:10:11 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 7 May 2013 08:10:19 UTC