Re: bad language tags

On 07/05/13 04:27, Pat Hayes wrote:
>
> On May 6, 2013, at 2:15 PM, Antoine Zimmermann wrote:
>
>> Le 06/05/2013 19:47, Pat Hayes a écrit :
>>> (I think we may have decided this already, but can't find the
>>> decision.)
>>>
>>> If some RDF has a language-tagged literal with a bad language tag
>>> (not conforming to section 2.2.9 of BPC 47), is that
>>>
>>> 1. an RDF syntax error

What's an "RDF syntax error"?  Concrete or Abstract Syntax?

Real data may well have legal, correctly canonicalised language tags
... which are then not legal RDF due to case.

@en-US

>>> 2. syntactically legal but inconsistent (because the literal has
>>> no legal value)
 >>> 3. legal and consistent (because even a bad
>>> language tag is still an RDF language tag) ?
>>
>> RDF concepts says that a language-tagged string has a lexical form
>> (a UNICOD string), a datatype IRI (rdf:langString) and a language
>> tag (a non-empty language tag as defined by [BCP47]. The language
>> tag must be well-formed according to section 2.2.9 of [BCP47], and
>> must be normalized to lowercase).

The lower case requirement is in the abstract syntax.

Many processors implement RFC3066 as does the Turtle grammar.

>>
>> Anything else is not a language-tagged string.
>> So, it's answer 1.

By that argument "@en-US" is a syntax error yet it is the canonical form.

> Well, that is how I would interpret that MUST as well, but I think
> that it would be better if it were to say this explicitly, because
> this being a syntax error requires all conformant RDF parsers to know
> all about wellformedness of language tags. I actually think this is a
> very bad decision, if it really is what the WG intended to do. Which
> is why I wanted to make sure that the text was very clear on exactly
> what is intended here.

well-formedness of languiage tags isn't too bad - it's the grammar in 
2.2.9 although in Turtle, RFC 3066 is used.

Concepts says

"""
5. Otherwise, the literal is ill-typed, and no literal value can be
associated with the literal. Such a case, while in error, is not
syntactically ill-formed.
"""

+1 to 3.

	Andy

>
> Pat
>
>> There has been discussion about it, and I think this was what we
>> came to agree on, but I don't remember if it has been reflected in
>> a WG resolution.
>>
>>
>> AZ
>>
>>
>>>
>>> Pat
>>>
>>> PS.  If we have to decide this, I vote for 3 as being less work
>>> to implement, and on the grounds that RDF's job isn't to check on
>>> bad data.
>>>
>>> ------------------------------------------------------------ IHMC
>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>> (850)202 4416   office Pensacola
>>> (850)202 4440   fax FL 32502
>>> (850)291 0667   mobile phayesAT-SIGNihmc.us
>>> http://www.ihmc.us/users/phayes
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>
>>
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> (850)202 4416   office Pensacola                            (850)202
> 4440   fax FL 32502                              (850)291 0667
> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>

Received on Tuesday, 7 May 2013 07:27:05 UTC