W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > April 2009

Re: Are RDFa parsers responsible for URL-encoding values when generating URLs?

From: Shane McCarron <shane@aptest.com>
Date: Tue, 07 Apr 2009 08:14:31 -0500
Message-ID: <49DB51B7.60204@aptest.com>
To: Manu Sporny <msporny@digitalbazaar.com>
CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>


Manu Sporny wrote:
> Ben Adida wrote:
>   
>> Manu Sporny wrote:
>>     
>>> If there are no objections, I can create a test case to ensure that
>>> percent-encoding is performed before the RDFa parser generates a triple.
>>>       
>> I'm not opposed to this, but... isn't our SPARQL approach to testing
>> already doing this effectively? Or are you saying that there are no
>> tests with spaces and you want to add one?
>>     
>
> I'm saying the second one... and if so, would this XHTML:
>
> <base href="http://www.example.org/"></base>
> ...
> <div about="Milan Marriott" typeof="foaf:Person">...</div>
>
> cause the RDFa parser to generate this as the subject (Subject A):
>
> http://www.example.org/Milan+Marriott
>
> or this (Subject B):
>
> http://www.example.org/Milan%20Marriott
>
> ... and if (Subject B) is what should be generated, is (Subject A) still
> valid output for the RDFa parser?
>   
I was originally confused about the issue you were raising... sorry.  In 
the case of attributes that create subjects or objects by URI - such as 
@about, @resource, @src, or @href, the datatype of the attribute is the 
authority.  Looking at RDFa I see that the relevant portion of the 
datatype for those attributes is URI.  The URI datatype permits but 
discourages the use of whitespace unless %20 encoded.   [1] 

So basically I think that it is beyond the scope of our specifications 
to say anything about how absolute URIs are created from relative URIs.  
We already defer to the normative specifications and we should continue 
to do so.  That specification [2] doesn't appear to permit the syntax in 
(Subject A) so in terms of your implementation.... I personally would 
never generate that. 


[1] http://www.w3.org/TR/xmlschema-2/#anyURI
[2] http://www.ietf.org/rfc/rfc3986.txt

> We should follow the encoding rules in RFC 3986[1], but this leads to a
> number of URI canonicalization issues, doesn't it? What you do and don't
> encode depends on the URI scheme... but we don't want to over-complicate
> RDFa parser implementation.
>
> Also, in practice - ASP Server's URLEncode() function would encode it as:
>
> http://www.example.org/Milan+Marriott
>
> while Javascript's encodeURI() would do this:
>
> http://www.example.org/Milan%20Marriott
>
> Do we state that easing URL normalization/canonicalization is a complex
> problem not covered by RDFa and which should be handled at a higher
> level, or do we specify some guidance when encoding values (such as
> "SHOULD percent encode ...", or "SHOULD NOT percent encode ...")?
>
> -- manu
>
> [1] http://tools.ietf.org/html/rfc3986
>
>   

-- 
Shane P. McCarron                          Phone: +1 763 786-8160 x120
Managing Director                            Fax: +1 763 786-8180
ApTest Minnesota                            Inet: shane@aptest.com
Received on Tuesday, 7 April 2009 13:15:13 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 7 April 2009 13:15:15 GMT