Re: bad language tags

On May 7, 2013, at 3:09 AM, Antoine Zimmermann wrote:

> Le 07/05/2013 09:26, Andy Seaborne a écrit :
>> 
>> 
>> On 07/05/13 04:27, Pat Hayes wrote:
>>> 
>>> On May 6, 2013, at 2:15 PM, Antoine Zimmermann wrote:
>>> 
>>>> Le 06/05/2013 19:47, Pat Hayes a écrit :
>>>>> (I think we may have decided this already, but can't find the
>>>>> decision.)
>>>>> 
>>>>> If some RDF has a language-tagged literal with a bad language tag
>>>>> (not conforming to section 2.2.9 of BPC 47), is that
>>>>> 
>>>>> 1. an RDF syntax error
>> 
>> What's an "RDF syntax error"?  Concrete or Abstract Syntax?
> 
> I assume that what Pat is interested in at the moment is abstract syntax.  For his editing of RDF Semantics, it's the only thing that matters.

Correct. But it might be good to have a clear story to tell about concrete syntaxes, as well. 

> 
> 
>> Real data may well have legal, correctly canonicalised language tags
>> ... which are then not legal RDF due to case.
>> 
>> @en-US
> 
> This is not a problem. Language tags do not have @ either. The concrete syntax can allow many things as long as it's clear how it maps to a valid RDF Graph. Reasoning on a particular concrete syntax like this would lead you to the conclusion that an integer does not need to have a datatype IRI because you can write it like that in Turtle.
> 
> 
>>>>> 2. syntactically legal but inconsistent (because the literal has
>>>>> no legal value)
>> >>> 3. legal and consistent (because even a bad
>>>>> language tag is still an RDF language tag) ?
>>>> 
>>>> RDF concepts says that a language-tagged string has a lexical form
>>>> (a UNICOD string), a datatype IRI (rdf:langString) and a language
>>>> tag (a non-empty language tag as defined by [BCP47]. The language
>>>> tag must be well-formed according to section 2.2.9 of [BCP47], and
>>>> must be normalized to lowercase).
>> 
>> The lower case requirement is in the abstract syntax.
> 
> Right.
> 
>> Many processors implement RFC3066 as does the Turtle grammar.
> 
> It's alright. The lower case requirement is simply to defined what is the identity of a language tag. If you write @en-US or @en-us in Turtle, you are using the same language tag. It does not matter how the parser deals with this, as long as they compare equal.

How about, say, @qwertyuiop or @23.7 ? 

> 
> 
>>>> 
>>>> Anything else is not a language-tagged string.
>>>> So, it's answer 1.
>> 
>> By that argument "@en-US" is a syntax error yet it is the canonical form.
> 
> In the abstract syntax "@en-US" would be strongly wrong because of the @ character. It does not need be a syntax error in Turtle, but it's an error in RDF/XML or JSON-LD. One could imagine a syntax where en-US is a syntax error.
> 
> 
>> 
>>> Well, that is how I would interpret that MUST as well, but I think
>>> that it would be better if it were to say this explicitly, because
>>> this being a syntax error requires all conformant RDF parsers to know
>>> all about wellformedness of language tags. I actually think this is a
>>> very bad decision, if it really is what the WG intended to do. Which
>>> is why I wanted to make sure that the text was very clear on exactly
>>> what is intended here.
>> 
>> well-formedness of languiage tags isn't too bad - it's the grammar in
>> 2.2.9 although in Turtle, RFC 3066 is used.
> 
> If any RFC3066-valid tag can be mapped in a non-ambiguous way to a BCP47-valid tag, then it's not a contradiction (but maybe a remark on this should be put somewhere in the Turtle spec).
> 
>> 
>> Concepts says
>> 
>> """
>> 5. Otherwise, the literal is ill-typed, and no literal value can be
>> associated with the literal. Such a case, while in error, is not
>> syntactically ill-formed.
>> """
> 
> Language-tagged strings cannot be ill-typed since they do not have a lexical space, and they are interpreted in their own special way.

Then that special way needs to spell out how to treat cases like @23.7 , which it currently does not. Hence my question. 

Right now I do not seem to have a clear consensus on an answer.

Pat

> 
> 
> AZ.
> 
> 
>> 
>> +1 to 3.
>> 
>>     Andy
>> 
>>> 
>>> Pat
>>> 
>>>> There has been discussion about it, and I think this was what we
>>>> came to agree on, but I don't remember if it has been reflected in
>>>> a WG resolution.
>>>> 
>>>> 
>>>> AZ
>>>> 
>>>> 
>>>>> 
>>>>> Pat
>>>>> 
>>>>> PS.  If we have to decide this, I vote for 3 as being less work
>>>>> to implement, and on the grounds that RDF's job isn't to check on
>>>>> bad data.
>>>>> 
>>>>> ------------------------------------------------------------ IHMC
>>>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>>> (850)202 4416   office Pensacola
>>>>> (850)202 4440   fax FL 32502
>>>>> (850)291 0667   mobile phayesAT-SIGNihmc.us
>>>>> http://www.ihmc.us/users/phayes
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>>>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>>>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>>>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>>> 
>>>> 
>>> 
>>> ------------------------------------------------------------ IHMC
>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>> (850)202 4416   office Pensacola                            (850)202
>>> 4440   fax FL 32502                              (850)291 0667
>>> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Wednesday, 8 May 2013 14:23:05 UTC