Re: X(HT)ML Fragment Identifiers

Hi Erik,

On Jun 9, 2008, at 11:59 PM, Erik Wilde wrote:

>> First, you mention XPointer and XLink, but I'm not clear how your  
>> suggestion is different from XPointer. BTW, I think is a bit to  
>> premature to declare things such as XPointer a failure, though I  
>> think its fair to say it has not (yet) taken hold.
>
> my suggestion would be specific for the html media type, whereas  
> xpointer applies to xml in general. xpointer is a bit more complex  
> and needs to be so, because it has to take namespaces into account,  
> and it also was trying to allow arbitrary ranges within xml  
> documents. i think it would be better to have something simpler  
> (xpointer now is even a modularized spec with several schemes). i  
> think it is fair to say that xpointer is a failure, but that  
> certainly is a matter of taste; it definitely is not a success...

I'll take your word for it on it "not a success". I definitely think  
the broad goals of XPointer (which if I understand correctly are  
similar to yours), are worth pursuing. However, what are the lessons  
we should draw from XPointer’s shortcomings to apply to this new path  
based HTML5 fragment identifier. It seems to me that even if its  
specified for HTML5, it would/should easily apply to any XML / SGML  
which share the same tree structure. Or am I missing something in your  
proposal that relates specifically to HTML. Given the similarities, UA  
implementors  would want to reuse the same codebase wherever it made  
sense (like to XML as well). In my view, the fragment identifier  
should be indifferent to the document format anyway.

>> Second, what specifically would you want HTML5 to do regarding such  
>> pointers. For example, would it be sufficient for HTML5 to  
>> encourage/recommend/require HTML5 conforming UAs to support  
>> XPointer (by way of reference) or are you suggesting something very  
>> different than that.
>
> i would like to create something new and simpler than xpointer, and  
> to declare it as the extended fragment identification semantics for  
> html5. user agents would be encouraged to support it, and the spec  
> could even contain suggested algorithms how to turn points and  
> ranges into html5 fragment identifiers. with no support in user  
> agents, fragments in html5 would remain as rarely used as with html4.
>
>> Overall, I think adding such capabilities is usually a big win for  
>> authors and users. Especially with so many of the W3Cs  
>> recommendations, their abstract and modular nature, enables  
>> capabilities and use cases many of us have not yet imagined. I also  
>> think this topic is closely related to two other issues that  
>> facilitate explicit document markup for document fragments:
>> • The issue of id and xml:id and allowing an IDENT data type that  
>> facilitates identifying document fragments by element-node paths [1]
>
> @id and @xml:id to me look like entirely different things than what  
> [1] describes. @is is covered by html4 fragment identifiers, and the  
> minimum extension of html5 fragment identifiers probably should be  
> to also cover @xml:id.
>
> [1] probably should have been [2], and thanks for pointing to it.  
> the IDENT semantics look kind of weird to me, and would be at odds  
> with other xml technologies (and suggesting @id should get the IDENT  
> semantics looks very wrong to me; html4 defines it as being an ID).  
> but anyway, @xml:id should definitely be covered by html5.

[sorry for reversing the two footnotes.] It is a departure from  
current specifications, but it does provide better support for  
aggregating documents and maintaining id values. The reason I bring it  
up, is that such approach also desperately needs a different fragment  
identifier method as well, since fragments identifiers for a document  
may be unique only for a specific IDENT path. In that sense I think  
you're suggestion complements this proposal quite nicely. What I  
imagine is having three ways to specify a fragment relative to and  
idref: 1) by a path of sibling indices; 2) by a path of named IDENTs  
or 3) by a hybrid combination of the two where one or the other method  
could follow for each successive descendant from the rooted idref (#id).

>> • The issue of including explicit bookmarks (void non-displayed  
>> elements) and arbitrary non-hierarchical clippings [2]
>
> aha, [1] and [2] were in the wrong order. well, the clippings talk  
> about changing the document, whereas fragment identifiers do not add  
> any document markup, they simply define fragment identification  
> semantics.

Agreed. That's definitely an important distinction between the two  
approaches, however again I think they're complementary. Adding  
bookmarks to a document provides more IDREFs to which to root your  
proposed fragment identifiers. Also, they both address similar use  
cases — differing only in the one is meant only for those with write  
access and the other is for anyone with read access, but is more  
fragile to edits of the document.

>
>
>> Both of these issues facilitate similar goals, but require write  
>> access to the document and often needlessly littering the document  
>> with extra markup when an XPointer-like pointer would suffice. On  
>> the other hand, these two proposals also make it easier and more  
>> likely to have nearby ids to root a pointer.
>
> the whole idea of fragment identifers is based on using documents to  
> which there is no write access, and on the web, that's the norm. so  
> adding more nuanced fragment identifers does not any complexity to  
> html5 markup, it simply adds a more hypermedia-like way how user  
> agents can support access to html documents.

Understood. Though hopefully there aren't any documents on the web  
that no one has write access to :-).

>> It also dove-tails nicely with the existing cite attribute which  
>> would often reference persistent URLs with static and stable pages  
>> (likewise, the proposal to enhance HTML referencing capabilities  
>> [3]).
>
> i like the idea to have better quotations in html5. html4 is bad at  
> that, and better fragment identification would be a good complement  
> to better markup design for quotations.

Definitely a use case needed more flexible fragment identifiers.

>> To track discussion on this, you might want to alternatively make  
>> use of the W3C wiki. It provides a place to edit and refine the  
>> issue including adding use cases, possible solutions and collected  
>> references. Likewise it provides a persistent URL to refer to the  
>> issue. The nice thing about the wiki is it nicely integrates two  
>> complementary functions: 1) the revisable wiki page that can  
>> reflect the current state of the issue/proposal and 2) it easily  
>> links to persistent logs such as a) the official discussion here on  
>> the WG mail serve, b) the change log for the wiki page; c) links to  
>> IRC or other logged threads on the topic. Bugzilla does not really  
>> add much to that (the other problem with buzilla is it bifurcates —  
>> or trifurcates — the current deliberations of the WG).
>
> is the wiki official? and how does it relate to bugzilla, which also  
> seems to be attempt to turn such an idea into something that has a  
> persistent URI? i am willing to create such a page in either the  
> wiki or in bugzilla, but i would like to avoid having to do both and  
> having to keep both up to date...

The wiki is official and has been used by the WG since its inception.  
There has been some groping as the WG tries to find the best way to  
address issues in HTML5 and the HTML5 draft. Bugzilla is the latest  
suggestion for a new way to track issues. No one from the HTML WG has  
yet started using it (you would be the first) and I'm not sure it  
provides any benefit over the wiki. BTW, I was wrong about needing to  
do your own TOC / qucklinks. Adding the syntax [[TableOfContents]] to  
a document automatically generates a table of contents on the wiki.

Take care,
Rob

Received on Monday, 9 June 2008 22:37:57 UTC