X(HT)ML Fragment Identifiers

hello everybody.

the following post is something i have recently published on two blogs 
(my personal one and xml.com), and i have been encouraged to post it to 
the public-html list, so here it is, comments of course are very welcome!



X(HT)ML Fragment Identifiers

The recently published HTML 5 draft does not change anything regarding 
HTML fragment identifiers. They are still limited to IDs only (with <a 
name=""> as alternative for backwards-compatibility). This means that 
any reference into an HTML page depends on how the page is using IDs.

But wouldn't HTML 5 be a wonderful opportunity to bring a little bit 
more hypermedia back to the Web? XML had XLink and XPointer. Both were 
failures for a number of reasons, but I am still a big fan of trying to 
make the Web more hypermedia-like. So why not learn from XPointer and 
try to give HTML 5 a more practical and useful set of fragment 
identification methods than just IDs?

The whole fragment identification idea is a classic chicken and egg 
problem. Why use them when they're not supported? Why support them when 
they're not used? We had a lot remarks like that when we worked on 
fragment identifiers for plain text files, but I still believe it is 
good to have mechanisms like that. Assume Firefox had a feature where 
you just moused over a paragraph, right-clicked, and then you could send 
an email with a pointer to that paragraph. If the receiver had Firefox, 
the browser would scroll to and highlight that paragraph. I am still 
convinced a lot of people would find such a feature pretty useful. And 
things would not break in another browser, users would simply not get 
the scroll/highlight behavior.

While I am convinced that HTML 5 would be the right point in time to 
introduce such an improved fragment identification method and try to fix 
the fact that few people use HTML fragment identification, I am not 
really sure how to best do it. My guess is there should be three basic 
ways of identifying fragments:

     * IDs: For backwards compatibility, IDs (and <a name="">) should be 
supported. It would be what XPointer called barenames or shorthands.
     * Child Sequences: Similar to XPointer's child sequence, there 
should be one in HTML 5, which could either start at the page body, or 
at an ID. The fragment identifier #warning/2/3 would identify the third 
child of the second child of the id=warning element.
     * Character Pointers: Should there also be a way of how to point to 
a position? Maybe defined by counting characters in the page's string 
value? Hard to tell, but this is where XPointer definitely went over the 
top and was never finished, because it even tried to define arbitrary 
ranges, which is really hard to do.

Maybe just IDs and child sequences could do the trick? There also should 
be a well-defined behavior for browsers, so that a user instructing a 
browser to create a fragment identifier could be sure that it will 
always be rooted at the nearest ID, to make it less likely to break. I 
am sure there are many more details to figure out, but I am curious 
whether anybody else thinks this could become a pretty useful addition 
to how HTML can be used.

And please don't even ask about how to handle situations where CSS is 
hiding parts of the document, maybe dynamically, or even worse, where 
scripting code is changing the document's DOM. It would be necessary to 
have well-defined behavior for all possible situations, but my guess is 
that for the majority of static Web pages, fragment identification in a 
rather simple form would already be pretty useful as a way to better 
communicate about Web content.

I would be really interested whether this is just another of those ideas 
that kind of feel right, but where a lot of people think it is not going 
to work or not worth the effort, or whether this could actually work. I 
would certainly love to see the Web becoming a better hypermedia system.

Received on Thursday, 22 May 2008 17:30:51 UTC