Re: bad language tags

We may be able to have our cake and eat it here.

We could define a language-tagged literal as a string plus a
case-insensitive ASCII string that matches the language tag regex.
Implementations can choose to case fold or not, as long as they do the
"right thing".  Good RDF implementations (are there any) will return
only one form of the literal even if they internally keep several (for
round-tripping, etc.).   Poor RDF implementations may return multiple
"copies" of the same triple.

This would require changes to Concepts, but *no* changes to Semantics,
as Semantics only talks about language tags.   Probably Semantics
should include a note that language tags are case insensitive.

That said, I've argued myself back to preferring the requirement that
implementations treat language tags as lowercase ASCII.  To me, this
is to be considered the same as whitespace.  We don't expect
whitespace to be preserved by RDF implementations, so why should we
expect the irrelevant case of language tags to be preserved?

peter


On Wed, May 8, 2013 at 7:22 AM, Pat Hayes <phayes@ihmc.us> wrote:
>
> On May 7, 2013, at 3:09 AM, Antoine Zimmermann wrote:
>
>> Le 07/05/2013 09:26, Andy Seaborne a écrit :
>>>
>>>
>>> On 07/05/13 04:27, Pat Hayes wrote:
>>>>
>>>> On May 6, 2013, at 2:15 PM, Antoine Zimmermann wrote:
>>>>
>>>>> Le 06/05/2013 19:47, Pat Hayes a écrit :
>>>>>> (I think we may have decided this already, but can't find the
>>>>>> decision.)
>>>>>>
>>>>>> If some RDF has a language-tagged literal with a bad language tag
>>>>>> (not conforming to section 2.2.9 of BPC 47), is that
>>>>>>
>>>>>> 1. an RDF syntax error
>>>
>>> What's an "RDF syntax error"?  Concrete or Abstract Syntax?
>>
>> I assume that what Pat is interested in at the moment is abstract syntax.  For his editing of RDF Semantics, it's the only thing that matters.
>
> Correct. But it might be good to have a clear story to tell about concrete syntaxes, as well.
>
>>
>>
>>> Real data may well have legal, correctly canonicalised language tags
>>> ... which are then not legal RDF due to case.
>>>
>>> @en-US
>>
>> This is not a problem. Language tags do not have @ either. The concrete syntax can allow many things as long as it's clear how it maps to a valid RDF Graph. Reasoning on a particular concrete syntax like this would lead you to the conclusion that an integer does not need to have a datatype IRI because you can write it like that in Turtle.
>>
>>
>>>>>> 2. syntactically legal but inconsistent (because the literal has
>>>>>> no legal value)
>>> >>> 3. legal and consistent (because even a bad
>>>>>> language tag is still an RDF language tag) ?
>>>>>
>>>>> RDF concepts says that a language-tagged string has a lexical form
>>>>> (a UNICOD string), a datatype IRI (rdf:langString) and a language
>>>>> tag (a non-empty language tag as defined by [BCP47]. The language
>>>>> tag must be well-formed according to section 2.2.9 of [BCP47], and
>>>>> must be normalized to lowercase).
>>>
>>> The lower case requirement is in the abstract syntax.
>>
>> Right.
>>
>>> Many processors implement RFC3066 as does the Turtle grammar.
>>
>> It's alright. The lower case requirement is simply to defined what is the identity of a language tag. If you write @en-US or @en-us in Turtle, you are using the same language tag. It does not matter how the parser deals with this, as long as they compare equal.
>
> How about, say, @qwertyuiop or @23.7 ?
>
>>
>>
>>>>>
>>>>> Anything else is not a language-tagged string.
>>>>> So, it's answer 1.
>>>
>>> By that argument "@en-US" is a syntax error yet it is the canonical form.
>>
>> In the abstract syntax "@en-US" would be strongly wrong because of the @ character. It does not need be a syntax error in Turtle, but it's an error in RDF/XML or JSON-LD. One could imagine a syntax where en-US is a syntax error.
>>
>>
>>>
>>>> Well, that is how I would interpret that MUST as well, but I think
>>>> that it would be better if it were to say this explicitly, because
>>>> this being a syntax error requires all conformant RDF parsers to know
>>>> all about wellformedness of language tags. I actually think this is a
>>>> very bad decision, if it really is what the WG intended to do. Which
>>>> is why I wanted to make sure that the text was very clear on exactly
>>>> what is intended here.
>>>
>>> well-formedness of languiage tags isn't too bad - it's the grammar in
>>> 2.2.9 although in Turtle, RFC 3066 is used.
>>
>> If any RFC3066-valid tag can be mapped in a non-ambiguous way to a BCP47-valid tag, then it's not a contradiction (but maybe a remark on this should be put somewhere in the Turtle spec).
>>
>>>
>>> Concepts says
>>>
>>> """
>>> 5. Otherwise, the literal is ill-typed, and no literal value can be
>>> associated with the literal. Such a case, while in error, is not
>>> syntactically ill-formed.
>>> """
>>
>> Language-tagged strings cannot be ill-typed since they do not have a lexical space, and they are interpreted in their own special way.
>
> Then that special way needs to spell out how to treat cases like @23.7 , which it currently does not. Hence my question.
>
> Right now I do not seem to have a clear consensus on an answer.
>
> Pat
>
>>
>>
>> AZ.
>>
>>
>>>
>>> +1 to 3.
>>>
>>>     Andy
>>>
>>>>
>>>> Pat
>>>>
>>>>> There has been discussion about it, and I think this was what we
>>>>> came to agree on, but I don't remember if it has been reflected in
>>>>> a WG resolution.
>>>>>
>>>>>
>>>>> AZ
>>>>>
>>>>>
>>>>>>
>>>>>> Pat
>>>>>>
>>>>>> PS.  If we have to decide this, I vote for 3 as being less work
>>>>>> to implement, and on the grounds that RDF's job isn't to check on
>>>>>> bad data.
>>>>>>
>>>>>> ------------------------------------------------------------ IHMC
>>>>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>>>> (850)202 4416   office Pensacola
>>>>>> (850)202 4440   fax FL 32502
>>>>>> (850)291 0667   mobile phayesAT-SIGNihmc.us
>>>>>> http://www.ihmc.us/users/phayes
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>>>>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>>>>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>>>>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>>>>
>>>>>
>>>>
>>>> ------------------------------------------------------------ IHMC
>>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>>> (850)202 4416   office Pensacola                            (850)202
>>>> 4440   fax FL 32502                              (850)291 0667
>>>> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>> --
>> Antoine Zimmermann
>> ISCOD / LSTI - Institut Henri Fayol
>> École Nationale Supérieure des Mines de Saint-Étienne
>> 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2
>> France
>> Tél:+33(0)4 77 42 66 03
>> Fax:+33(0)4 77 42 66 66
>> http://zimmer.aprilfoolsreview.com/
>>
>>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>

Received on Wednesday, 8 May 2013 16:13:00 UTC