- From: Sean B. Palmer <sean@mysterylights.com>
- Date: Wed, 10 Apr 2002 13:21:15 +0100
- To: "Jim Ley" <jim@jibbering.com>, "Steven Pemberton" <steven.pemberton@cwi.nl>
- Cc: <www-annotation@w3.org>, "HTML WG" <w3c-html-wg@w3.org>
> > If the document has id's then you can just use a URI, > > problem solved. But I thought a more general solution > > was being sought. Please be careful with terminology here: a URI plus fragment ID is a URI-reference, according to the BNF and terminology used in RFC 2396 [1]. > Can you have an xhtml1.1 document that is served as > text/html ? Of course. A better question is whether XHTML 1.1 served as either text/html, text/xml, or application/xhtml+xml makes any sense; but that's out of scope for this discussion, I feel. > [...] by your argument > http://jibbering.example/example#xpointer(id('Moomin')) > points to a different fragment depending on whether the > resource has an xhtml or html mime-type. Perhaps. If it's sent as text/xml (or any of the MIME types covered by XPointer), then if there's a Moomin ID declared, then it points to that fragment of content. If on the other hand it's sent as a MIME type not covered by XPointer, then it's basically going to point to something "undefined". > > > The question is not how does XPointer into HTML > > > compare to XPointer into XHTML, but can we point > > > to something in an HTML document? [[[ The URI specification [URI] notes that the semantics of a fragment identifier (part of a URI after a "#") is a property of the data resulting from a retrieval action, and that the format and interpretation of fragment identifiers is dependent on the media type of the retrieval result. For documents labeled as text/html, the fragment identifier designates the correspondingly named element; any element may be named with the "id" attribute, and A, APPLET, FRAME, IFRAME, IMG and MAP elements may be named with a "name" attribute. This is described in detail in [HTML40] section 12. ]]] - http://www.ietf.org/rfc/rfc2854.txt If a document is served as text/html and the fragment that you wish to identify is not named in the aforementioned manner, then you can't point to it; it's as simple as that. But the argument isn't as clear cut as that, as we all know. text/html and text/xml representations can be served together as variants of a single resource. In this case, the URI-ref:- http://jibbering.example/example#xpointer(id('Moomin')) Is utterly broken. The semantics of the URI-ref seem to depend greatly upon a retrieval action, and in this case, the semantics of the XPointer change depending upon one's accept headers. Accepting text/html means that the pointer is undefined, and accepting text/xml means that the pointer is defined. This is a problem with XPointer - it doesn't work very well with conventional HTTP machinery that has been around for years. Or rather, it only works when you necessarily limit yourself to sending a single (and perhaps fixed) variant. Many people believe that fragments must be persistent; in the case of XPointer, that means that your XML document had better not change one iota. So, if you want to use XPointer, you have to do so on a resource that has a single fixed XML representation. That's absurd. So, now you want to create a similar scheme for HTML. Now, since I've been working with you on the EARL project, I certainly understand what the rationale behind this is :-) We need to be able to refer to pieces of HTML documents in order to say things about them - perhaps in the context of an EARL eevaluation, or perhaps outside of it. Using some kind of pointer mechanism would be fine if (as noted above) people were constrained to sending their HTML document without any variants, and without changing the document in such a way that would break the HTML pointer. And really I'm just clarifying for the sake of people external to this discussion who may lack the context. Jim, I remember you saying to me that the requirement for HTML pointer is simply of being able to point into a *single* representation of HTML, and that the requirement for XPointer is of being able to point into a single represenation of XML. But many people don't seem to get it, and that's a bit of a shame, so it needs to be underlined at every opportunity. So the answer to your question is that, yes, the generic idea of an HTML pointer is good, useful, and architectually sound, but that one shouldn't abuse it in the manner that XPointer have abused theirs. In fact, you can't abuse HTML pointer since there is no way that you can make an amendment to the HTML media type RFC. Win-win. As for the canonicalization issues, that's a tough one. As Steven has pointed out, some HTML document are just so broken that even HTML tidy throws up all over them. OTOH, some documents are valid HTML, and for those a regular HTML pointer is certainly possible, given work. So, I suggest that you might want to look into the following algorithm:- 1) If the representation validates according to one of the standard HTML DTDs, then use a standard canonicalization, and an HTML pointer. 2) If the representation doesn't validate, then you have two choices:- 2a) Attempt to canonicalize it anyway, and use an HTML pointer 2b) Point to the piece of information using a column and line number, a regular expression, or something else based on the reduced hash experiments that Nick started working on. I hope that helps. [1] http://www.ietf.org/rfc/rfc2396.txt -- Kindest Regards, Sean B. Palmer @prefix : <http://purl.org/net/swn#> . :Sean :homepage <http://purl.org/net/sbp/> .
Received on Wednesday, 10 April 2002 08:22:14 UTC