Re: X(HT)ML Fragment Identifiers

HI Erik,

This issue certainly looks to me a valuable one to pursue. However,  
there's a few questions I'm not clear about that I'd like to ask.

First, you mention XPointer and XLink, but I'm not clear how your  
suggestion is different from XPointer. BTW, I think is a bit to  
premature to declare things such as XPointer a failure, though I think  
its fair to say it has not (yet) taken hold.

Second, what specifically would you want HTML5 to do regarding such  
pointers. For example, would it be sufficient for HTML5 to encourage/ 
recommend/require HTML5 conforming UAs to support XPointer (by way of  
reference) or are you suggesting something very different than that.

Overall, I think adding such capabilities is usually a big win for  
authors and users. Especially with so many of the W3Cs  
recommendations, their abstract and modular nature, enables  
capabilities and use cases many of us have not yet imagined. I also  
think this topic is closely related to two other issues that  
facilitate explicit document markup for document fragments:

  • The issue of id and xml:id and allowing an IDENT data type that  
facilitates identifying document fragments by element-node paths [1]
  • The issue of including explicit bookmarks (void non-displayed  
elements) and arbitrary non-hierarchical clippings [2]

Both of these issues facilitate similar goals, but require write  
access to the document and often needlessly littering the document  
with extra markup when an XPointer-like pointer would suffice. On the  
other hand, these two proposals also make it easier and more likely to  
have nearby ids to root a pointer.

It also dove-tails nicely with the existing cite attribute which would  
often reference persistent URLs with static and stable pages  
(likewise, the proposal to enhance HTML referencing capabilities [3]).

To track discussion on this, you might want to alternatively make use  
of the W3C wiki. It provides a place to edit and refine the issue  
including adding use cases, possible solutions and collected  
references. Likewise it provides a persistent URL to refer to the  
issue. The nice thing about the wiki is it nicely integrates two  
complementary functions: 1) the revisable wiki page that can reflect  
the current state of the issue/proposal and 2) it easily links to  
persistent logs such as a) the official discussion here on the WG mail  
serve, b) the change log for the wiki page; c) links to IRC or other  
logged threads on the topic. Bugzilla does not really add much to that  
(the other problem with buzilla is it bifurcates — or trifurcates —  
the current deliberations of the WG).

You could for example start a page at:

http://esw.w3.org/topic/HTML/DocumentFragmentPointers

or whatever name you'd like to call it.

If you'd like you can use one of my existing pages as a template[4].  
In particular the EMail section of the page facilitates the posting of  
relevant discussion to this WG mail server and the easy search for the  
topic on the W3Cs mail archive search engine.

Take care,
Rob

[1]: <http://esw.w3.org/topic/HTML/ClipBookmark>
[2]: <http://esw.w3.org/topic/HTML/IdAndTypeID>
[3]: <http://esw.w3.org/topic/HTML/AttrtibuCitaQuotationReferencing>
[4]: <http://esw.w3.org/topic/HTML/DefiningTermsEtc> (keep in mind  
that the QuickLinks need to be created manually by examining the  
MoinMoin automatically generated id values in the page’s source).


On May 22, 2008, at 7:30 PM, Erik Wilde wrote:

>
> hello everybody.
>
> the following post is something i have recently published on two  
> blogs (my personal one and xml.com), and i have been encouraged to  
> post it to the public-html list, so here it is, comments of course  
> are very welcome!
>
> http://dret.typepad.com/dretblog/2008/05/xhtml-fragment.html
> http://www.oreillynet.com/xml/blog/2008/05/xhtml_fragment_identifiers.html
>
> ----------------------------------------------------------------------
>
> X(HT)ML Fragment Identifiers
>
> The recently published HTML 5 draft does not change anything  
> regarding HTML fragment identifiers. They are still limited to IDs  
> only (with <a name=""> as alternative for backwards-compatibility).  
> This means that any reference into an HTML page depends on how the  
> page is using IDs.
>
> But wouldn't HTML 5 be a wonderful opportunity to bring a little bit  
> more hypermedia back to the Web? XML had XLink and XPointer. Both  
> were failures for a number of reasons, but I am still a big fan of  
> trying to make the Web more hypermedia-like. So why not learn from  
> XPointer and try to give HTML 5 a more practical and useful set of  
> fragment identification methods than just IDs?
>
> The whole fragment identification idea is a classic chicken and egg  
> problem. Why use them when they're not supported? Why support them  
> when they're not used? We had a lot remarks like that when we worked  
> on fragment identifiers for plain text files, but I still believe it  
> is good to have mechanisms like that. Assume Firefox had a feature  
> where you just moused over a paragraph, right-clicked, and then you  
> could send an email with a pointer to that paragraph. If the  
> receiver had Firefox, the browser would scroll to and highlight that  
> paragraph. I am still convinced a lot of people would find such a  
> feature pretty useful. And things would not break in another  
> browser, users would simply not get the scroll/highlight behavior.
>
> While I am convinced that HTML 5 would be the right point in time to  
> introduce such an improved fragment identification method and try to  
> fix the fact that few people use HTML fragment identification, I am  
> not really sure how to best do it. My guess is there should be three  
> basic ways of identifying fragments:
>
>    * IDs: For backwards compatibility, IDs (and <a name="">) should  
> be supported. It would be what XPointer called barenames or  
> shorthands.
>    * Child Sequences: Similar to XPointer's child sequence, there  
> should be one in HTML 5, which could either start at the page body,  
> or at an ID. The fragment identifier #warning/2/3 would identify the  
> third child of the second child of the id=warning element.
>    * Character Pointers: Should there also be a way of how to point  
> to a position? Maybe defined by counting characters in the page's  
> string value? Hard to tell, but this is where XPointer definitely  
> went over the top and was never finished, because it even tried to  
> define arbitrary ranges, which is really hard to do.
>
> Maybe just IDs and child sequences could do the trick? There also  
> should be a well-defined behavior for browsers, so that a user  
> instructing a browser to create a fragment identifier could be sure  
> that it will always be rooted at the nearest ID, to make it less  
> likely to break. I am sure there are many more details to figure  
> out, but I am curious whether anybody else thinks this could become  
> a pretty useful addition to how HTML can be used.
>
> And please don't even ask about how to handle situations where CSS  
> is hiding parts of the document, maybe dynamically, or even worse,  
> where scripting code is changing the document's DOM. It would be  
> necessary to have well-defined behavior for all possible situations,  
> but my guess is that for the majority of static Web pages, fragment  
> identification in a rather simple form would already be pretty  
> useful as a way to better communicate about Web content.
>
> I would be really interested whether this is just another of those  
> ideas that kind of feel right, but where a lot of people think it is  
> not going to work or not worth the effort, or whether this could  
> actually work. I would certainly love to see the Web becoming a  
> better hypermedia system.

Received on Saturday, 7 June 2008 09:45:49 UTC