W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > November 2008

Re: Not sure... (Re: Invalid namespace URI)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 13 Nov 2008 13:44:35 +0100
Message-ID: <491C2133.4030403@w3.org>
To: Peter Mika <pmika@yahoo-inc.com>
CC: public-rdf-in-xhtml-tf@w3.org
What I do now in the distiller (not yet uploaded on the system, but will
be in the next release...) is

- strip the URI from trailing and starting white spaces
- always go through the quoting of URIs, ie, to turn the space
characters into %20, before using them as URI prefixes
- if the (original) URI contains a white space, then a warning is generated

I am not sure anything else could be expected from a user agent...

Thanks!

Ivan

Peter Mika wrote:
> I'm not sure either... As I'm too lazy to read the whole spec, I did
> some testing in java, where...
> 
> URI uri1 = new URI("http://creativecommons.org/ns #");
> 
> throws a URI syntax exception
> 
> but interestingly
> 
> URI uri2 = new URI("http://creativecommons.org/ns%20#");
> 
> doesn't.
> 
> In any case, there is an appendix of the URI specification which seems
> to put the burden of removing whitespaces on the processing agent:
> 
> http://labs.apache.org/webarch/uri/rfc/rfc3986.html#delimiting
> 
> Quoting:
> 
> For robustness, software that accepts user-typed URI should attempt to
> recognize and strip both delimiters and embedded whitespace.
> 
> For example, the text
> 
>   Yes, Jim, I found it under "http://www.w3.org/Addressing/",
>   but you can probably pick it up from <ftp://foo.example.
>   com/rfc/>.  Note the warning in <http://www.ics.uci.edu/pub/
>   ietf/uri/historical.html#WARNING>.
> 
> contains the URI references
> 
>   http://www.w3.org/Addressing/
>   ftp://foo.example.com/rfc/
>   http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING
> 
> End quote.
> 
> Cheers,
> Peter
> 
> Ivan Herman wrote:
>> I actually wonder...
>>
>> RDFa uses the xmlns syntax for URI prefixing only. Ie, the only thing
>> that counts is whether it is a valid URI. If the result of the
>> processing is to generate
>>
>> http://creativecommons.org/ns&20#
>>
>> that _is_ a valid URI, isn't it? Ie, I guess the bug in the current
>> distiller code is that URI-s should be properly quoted.
>>
>> Having said that, such setting is probably an error, so if there is a
>> space in the string than a warning is probably in order. But, who knows,
>> some crazy users may want to use such a URI...
>>
>> Ivan
>>
>> Ivan Herman wrote:
>>  
>>> Hi Peter,
>>>
>>> thanks for the note. I will have a look into it but yes, the tool should
>>> probably warn...
>>>
>>> Ivan
>>>
>>> Peter Mika wrote:
>>>    
>>>> Hi All,
>>>>
>>>> We have found another corner case while looking at all the wonderful
>>>> RDFa on the Web:
>>>>
>>>> The page at [1] contains:
>>>>
>>>>
>>>> This
>>>> work by <a
>>>> xmlns:cc="http://creativecommons.org/ns
>>>> #
>>>> "
>>>>
>>>> which is probably not intended (the page is broken in some sense). When
>>>> run through either the XSLT or the Distiller this
>>>> becomes:
>>>>
>>>>      <cc:attributionName xmlns:cc="http://creativecommons.org/ns #">New
>>>> Jersey State Auto
>>>> Auction</cc:attributionName>
>>>>
>>>> which is normalized [1] as
>>>> xmlns:cc="http://creativecommons.org/ns&#x20
>>>> <http://creativecommons.org/ns&#x20>;#">
>>>>
>>>> It seems to me that what you get is XML well-formed but not
>>>> namespace-well-formed [2] because the attribute value is not a valid
>>>> URI.
>>>>
>>>> Not sure really what to do about this but the output is not very
>>>> useful... should the tools raise some warning?
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>> [1] http://www.w3.org/TR/REC-xml/#AVNormalize
>>>> [2] http://www.w3.org/TR/REC-xml-names/#Conformance
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> [1] http://www.njstateauto.com/preowned/index.cfm?make=Mercedes-Benz
>>>>
>>>>       
>>
>>   
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf


Received on Thursday, 13 November 2008 12:45:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 13 November 2008 12:45:17 GMT