RE: RDFa test suite addition from Hausenblas, Michael on 2008-07-30 (public-rdf-in-xhtml-tf@w3.org from July 2008)

From: Hausenblas, Michael <michael.hausenblas@joanneum.at>
Date: Wed, 30 Jul 2008 13:04:57 +0200
To: "Mark Birbeck" <mark.birbeck@webbackplane.com>, "Manu Sporny" <msporny@digitalbazaar.com>
Cc: "RDFa mailing list" <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <768DACDC356ED04EA1F1130F97D2985201823EC5@RZJC2EX.jr1.local>
I do by and large agree with Mark (i.e. his excellent description,
below) BUT in the same moment I'd like to point out the 'Opacity Axiom'
[1], [2]. Please note as well that in RDF we talk about URIrefs [3] and
*relative URIs are not used in an RDF graph* - FWIW, I'm happy to take
an action to evaluate how other RDF serialisations (e.g. RDF/XML, or
upcoming such as Turtle [4]) are dealing with this situation.

Cheers,
	Michael

[1] http://www.w3.org/DesignIssues/Axioms.html#opaque
[2] http://www.w3.org/TR/webarch/#uri-opacity
[3] http://www.w3.org/TR/rdf-concepts/#section-Graph-URIref
[4] http://www.w3.org/TeamSubmission/turtle/

----------------------------------------------------------
 Michael Hausenblas, MSc.
 Institute of Information Systems & Information Management
 JOANNEUM RESEARCH Forschungsgesellschaft mbH
  
 http://www.joanneum.at/iis/
----------------------------------------------------------
 

>-----Original Message-----
>From: public-rdf-in-xhtml-tf-request@w3.org 
>[mailto:public-rdf-in-xhtml-tf-request@w3.org] On Behalf Of 
>Mark Birbeck
>Sent: Tuesday, July 29, 2008 11:13 PM
>To: Manu Sporny
>Cc: RDFa mailing list
>Subject: Re: RDFa test suite addition
>
>
>Hi Manu,
>
>> Right, I didn't mean to imply that 'appending' will work in all cases
>> (even though I'm not convinced that the statement is not true).
>
>You began this thread by saying that the bug in librdfa was that in
>some circumstances the relative part was incorrectly being appended to
>the document, rather than the host name. So haven't you proved it
>yourself, that appending doesn't work in all cases? :)
>
>What's happened is that you've now found two possible ways to append
>(to the document part or to the hostname), but I'm afraid that the
>algorithm for converting a relative URI to an absolute one involves
>yet further possibilities.
>
>For example, the relative path:
>
>  sneaking_sally.mp3
>
>should be appended to the end of the *path* part, replacing the
>document. And so on.
>
>So the point is to use the 'proper' algorithm for turning a relative
>path into an absolute one, and you will always be ok, no matter what
>the URI is that you are dealing with (relative or not).
>
>The big question then, is whether the spec actually says to do this.
>
>
>> What you have said has got me wondering about what is correct,
>> acceptable and incorrect, however.
>
>You also had me wondering, too. I recalled investigating this quite a
>long time ago, and was starting to panic that I hadn't actually
>incorporated what I learned from my analysis into the spec.
>
>But thankfully I did:
>
>  5.4. CURIE and URI Processing
>
>  Since RDFa is ultimately a means for transporting RDF, then a key
>concept is the
>  resource and its manifestation as a URI. Since RDF deals with
>complete URIs (not
>  relative paths), then when converting RDFa to triples, any relative
>URIs will need to
>  be resolved relative to the base URI, using the algorithm defined in
>section 5 of RFC
>  3986 [URI], Reference Resolution.
>
>It certainly sounds like this point could do with being made more
>prominent, but hopefully you'll agree that such changes would merely
>be editorial, and that the spec itself is correct.
>
>(See below for a further mention in the spec of this issue, but in the
>context of CURIEs.)
>
>
>>>   
><http://rdfa.digitalbazaar.com/fuzzbot/demo/../../live/sneaking
>_sally.mp3>
>>
>> I realize that the URL above is not optimal, but is it 
>"wrong"? RFC-1738
>> says that the URL is valid (if I'm reading the RFC correctly):
>>
>> ftp://ftp.isi.edu/in-notes/rfc1738.txt
>
>First, note that [1] updates RFC 1738.
>
>Second, you're right that the URI is not 'wrong'. But the only way to
>obtain such a URI would be to enter it exactly as you have shown it.
>I.e., tou can't create such a URI by beginning with a relative path
>and making it absolute, since the the only way to do that is according
>to section 5 of [1], and that algorithm clearly shows how the dot
>segments would be removed.
>
>But also, if you query to your triple store for everything the store
>knows about this:
>
>  <http://rdfa.digitalbazaar.com/live/sneaking_sally.mp3>
>
>will you also get back information about:
>
>  
><http://rdfa.digitalbazaar.com/fuzzbot/demo/../../live/sneaking
>_sally.mp3>
>
>If you do, then that's great...but I'd also be really surprised; I
>would imagine that once the URI is in the store, it's treated pretty
>much like a string.
>
>
>> Is it the RDFa parser's job to normalize URLs? I can 
>certainly see the
>> argument for why it should tidy up URLs, but I don't think 
>this is a MUST.
>
>I think it should, for two reasons, one concerning RDFa in general,
>and the other relating to its particular manifestation as XHTML+RDFa.
>
>The first reason is that RDF deals with absolute URIs. So any relative
>paths have to be made absolute somehow, when creating triples. RFC
>3986 [1] has a simple algorithm for doing this, which also has the
>effect of removing dot segments.
>
>So if we were not to use that algorithm to make relative paths
>absolute, which algorithm would we use? As you've discovered, simple
>concatenation doesn't work, since you keep finding another relative
>path that messes you up.
>
>The second reason is that XHTML+RDFa is a layer on top of XHTML. So
>what we're doing is giving a semantic *interpretation* of the
>underlying XHTML. To make this useful, we should really be generating
>the same triples for the same semantics. And if I say that the
>resource:
>
>  <http://rdfa.digitalbazaar.com/live/sneaking_sally.mp3>
>
>is 5 minutes long, then the manner I use to express that at the XHTML
>level shouldn't affect the semantics that are generated.
>
>(As an aside, when parsing in HTML browsers, if you request the value
>of @href using getAttribute(), some browsers will give you the full,
>absolutised path, relative to the 'base' of the document and others
>will give you the original value put in there by the author, which
>could contain dot segments. So in those parsers you have to normalise,
>otherwise you won't achieve browser consistency.)
>
>
>> If it's not a MUST, then we find ourselves in a position where the
>> application/inference engine MUST normalize the URLs coming 
>in from the
>> RDFa parser.
>
>It's not really 'normalising', it's using the proper algorithm to turn
>a relative path into an absolute one. That algorithm takes care of
>'.', '..', and all sorts of other things.
>
>Anyway, we have it in the spec, but you are right that we should
>perhaps consider making the wording both clearer and stronger.
>
>
>> Take this CURIE as an example:
>>
>> <span xmlns:ex="http://example.org/2008-10-24/docs/api/"
>>      about="[ex:../ref/a.html]">...</span>
>>
>> a bit contrived, but would you say that the parser should 
>output this URI:
>>
>> http://example.org/2008-10-24/docs/api/../ref/a.html
>>
>> or this one:
>>
>> http://example.org/2008-10-24/docs/ref/a.html
>
>The latter.
>
>Section 5.4.2, "Converting a CURIE to a URI" describes the 
>following algorithm:
>
>  Since a CURIE is merely a means for abbreviating a URI, its value is
>a URI, rather
>  than the abbreviated form. Obtaining a URI from a CURIE involves the
>following steps:
>
>  1. Split the CURIE at the colon to obtain the prefix and the 
>resource.
>  2. Using the prefix and the current in-scope mappings, 
>obtain the URI that the
>  prefix maps to.
>  3. Concatenate the mapped URI with the resource value, to obtain an
>absolute URI.
>
>After that description you'll see that there is a blue box that refers
>back to the earlier point about what it means to create absolute URIs
>from relative ones:
>
>  Note that it is generally considered a good idea not to use relative
>paths in namespace
>  declarations, but since it is possible that an author may ignore
>this guidance, it is further
>  possible that the URI obtained from a CURIE is relative. However,
>since all URIs must
>  be resolved relative to [base] before being used to create triples,
>the use of relative paths
>  should not have any effect on processing.
>
>Now this doesn't quite deal with the example you gave; I was more
>dealing with this:
>
>  <span xmlns:ex="/2008-10-24/docs/api/"
>   about="[ex:../ref/a.html]">...</span>
>
>which when concatenated still only gives a relative path:
>
>  /2008-10-24/docs/api/../ref/a.html
>
>The point that I was trying to stress when I wrote this was that this
>would still be ok, provided that you always use the algorithm in [1],
>and that algorithm would also take care of your example.
>
>However, I agree again that it wouldn't hurt to make this point more
>forcefully, but again, I think this is just about stress in the prose,
>rather than a fundamental issue.
>
>
>> If our argument is that CURIEs are simple concatenations, at 
>what point
>> in the process is the "strange URL" converted into the 
>"normalized URL"?
>
>I do my normalisation in the parser, before passing the 
>results to the store.
>
>
>> If we do think it should be the parser that normalizes URLs, we don't
>> have such a statement in the RDFa Syntax document, do we?
>
>I think we do, as described above, re the note in 5.4.2
>
>Regards,
>
>Mark
>
>[1] <http://gbiv.com/protocols/uri/rfc/rfc3986.html>
>
>-- 
>Mark Birbeck, webBackplane
>
>mark.birbeck@webBackplane.com
>
>http://webBackplane.com/mark-birbeck
>
>webBackplane is a trading name of Backplane Ltd. (company number
>05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
>London, EC2A 4RR)
>
>
Received on Wednesday, 30 July 2008 11:09:48 UTC