Re: Multiple result-documents, client-side transformations, and U RIs from Axel Hecht on 2003-05-19 (public-qt-comments@w3.org from May 2003)

From: Axel Hecht <axel@pike.org>
Date: Mon, 19 May 2003 22:48:55 +0200
To: "Kay, Michael" <Michael.Kay@softwareag.com>
CC: public-qt-comments@w3.org, bsmedberg@covad.net
Message-ID: <3EC94337.6000707@pike.org>
Hi Micheal et al.

Kay, Michael wrote:
>>Hi Michael et al,
>>
>>I'm peer of XSLT in Mozilla and would like to give my own two 
>>cents on 
>>the ideas around xsl:result-document in a webbrowser environment.
> 
> 
> Thanks for your note, this is very useful input.
> 
> 
>>Upfront, I really like the idea of having multiple output 
>>documents in a 
>>publishing and maybe even a server environment.
>>
>>In a webbrowser on the other hand, xsl:result-document raises quite a 
>>few issues on the sandbox. One issue is already dealt with, making 
>>absolut URIs in xsl:result-document having implementation defined 
>>constraints.
>>Reading the WD-xslt20-20030502, I don't see the chance to have 
>>implementation defined constraints for relative URIs, though.
> 
> 
> The thinking here is that there are no security/sandbox issues in creating a
> result tree, only in serializing it to persistent storage. We tried to
> separate the two things. Perhaps we have stretched things too far with the
> notion that a URI can be used to refer to a resource (a result tree) that is
> not persistent. The thinking is that you can create as many result trees as
> you like, using any URI that you like, and the transformation API then gives
> you a way to access these (in-memory) result trees using these URIs as a
> handle. Security concerns only kick in if you try to serialize the result
> tree to persistent disk storage.

I'm fine with separating serialisation from XSLT, but as the xslt2 spec 
is more concrete on errors, it should give hints on where the 
serialisation module might bail. I couldn't find any hint on 
failure-to-write in the serialisation spec, btw.

> 
>> From the use-cases we encountered so far, linking from the result to 
>>non-xslt pages like CSS or js is a very common practice. In fact, 
>>relative links in the generated content may point to either generated 
>>content or non-generated pages from the server. This kinda 
>>demands that 
>>the base output uri would be the same site and protocol as 
>>the original 
>>source document (at least for <?xml-stylesheet?>).
> 
> 
> Very good point, I see the difficulty: and off-hand, I don't see an easy
> solution. Perhaps the handles that are used to refer to in-memory result
> trees in the API should not be URIs at all; but the problem of defining
> cross-links between multiple result trees remains. Perhaps we need a new URI
> scheme?

I could imagine a xpointer scheme serving good for this. Something like
foo.xml#xslt-result-doc(part2/chapter5.html)id('target'), maybe 
xslt:result-doc?
This has the downside that links between result documents look different 
when you view the result in a browser (or any other implementation that 
doesn't serialise) as opposed to when you serialize them.
Having an xpointer scheme gives the chance to both specify the result 
document while still being able to identify fragments in the result doc.
Not sure who'd do the work to actually propose such a scheme, but if we 
had it, something like this might give a good hint to implementations:

Implementations that don't support serialisation MAY facilitate the 
XPointer scheme xslt-result-doc to expose result documents in their API.

This would make result documents reside "inside" the source document and 
resolve all security issues in a non-serialized environment. Of course, 
serialising implementations may throw errors for any effective output 
URI, so the spec should mention both the error and that the 
implementation should define when it throws the error.
(Didn't realise in my first answer that this problem is at least as 
prominent in implementations that do serialize as in those that don't.)

>>Now take this to a scenario of a site with split 
>>responsibilities. Most 
>>major sites probably are, and it goes down to simple things as user 
>>areas like http://www.junk.net/~whoever/. Having no restrictions on 
>>relative URIs in xsl:result-document enables a webauthor of 
>>one part of 
>>the site to shadow pages done by other entities. One can really go on 
>>and find stranger scenarios. But I don't wanna be a bore. Not 
>>more than necessary.
> 
> 
> I'm not sure I understand this scenario in detail, but I get the general
> thrust.
> 
>>All I try to point out is that implementations should be able to give 
>>constraints to relative URIs as well. These constraints should be 
>>permitted to be as tough as "you don't go anywhere but where you are" 
>>essentially disabling mutliple output documents in that 
>>implementation. 
>>An example of less strict rules would be that you must not 
>>access parent 
>>directories ('..' must not be part of the relative URI) and 
>>there must 
>>not be a page on the server with the same URI.
>>
>>19.1 could read something like
>>"There MAY be implementation-defined restrictions on the form of 
>>absolute and relative URI that may be used, but the implementation is 
>>not required to enforce any restrictions. Implementations MAY 
>>restrict 
>>the URI to the *base output URI*, effectively disabling 
>>multiple output 
>>documents."
>>instead of
>>"There may be implementation-defined restrictions on the form of 
>>absolute URI that may be used, but the implementation is not 
>>required to 
>>enforce any restrictions. Any legal relative URI must be accepted."
> 
> 
> I don't think the WG will have any difficulty allowing the implementation to
> impose further restrictions, and I will put this on the agenda as issue 183.
> 
> Many thanks for the feedback,
> 
> Michael Kay
> 

My pleasure

Axel
Received on Monday, 19 May 2003 16:48:34 UTC