Re: X(HT)ML Fragment Identifiers from Erik Wilde on 2008-06-12 (public-html@w3.org from June 2008)

From: Erik Wilde <dret@berkeley.edu>
Date: Thu, 12 Jun 2008 16:33:27 -0700
To: public-html@w3.org
CC: Robert J Burns <rob@robburns.com>
Message-ID: <4851B247.2040104@berkeley.edu>
hello robert.

>> my suggestion would be specific for the html media type, whereas 
>> xpointer applies to xml in general. xpointer is a bit more complex and 
>> needs to be so, because it has to take namespaces into account, and it 
>> also was trying to allow arbitrary ranges within xml documents. i 
>> think it would be better to have something simpler (xpointer now is 
>> even a modularized spec with several schemes). i think it is fair to 
>> say that xpointer is a failure, but that certainly is a matter of 
>> taste; it definitely is not a success...
> I'll take your word for it on it "not a success".

that's my opinion, but i think it certainly has not been a wild success 
story.

> I definitely think the 
> broad goals of XPointer (which if I understand correctly are similar to 
> yours), are worth pursuing.

XPointer's goal was to be applicable to any XML document, whereas my 
proposal should just work for (X)HTML. also, i think the very ambitious 
idea to be able to represent arbitrary ranges may be a bit too much, and 
could be omitted. furthermore, XPointer ended up to be modularized (i 
think just to get the simpler parts through the standardization 
process), but that usually means you end up with implementations 
implementing different feature sets.

> However, what are the lessons we should draw 
> from XPointer’s shortcomings to apply to this new path based HTML5 
> fragment identifier. It seems to me that even if its specified for 
> HTML5, it would/should easily apply to any XML / SGML which share the 
> same tree structure. Or am I missing something in your proposal that 
> relates specifically to HTML.

the most general XPointer scheme used XPaths (or even extended XPaths to 
cover ranges), and to be able to do so, it also needed a mechanism to 
establish namespace bindings. the result turned out to be really 
complicated and was never finished. for HTML, we don't have to worry 
about namespaces. and i think we should not even have XPaths, just IDs 
and child sequences should be sufficient.

> Given the similarities, UA implementors  
> would want to reuse the same codebase wherever it made sense (like to 
> XML as well). In my view, the fragment identifier should be indifferent 
> to the document format anyway.

i don't think there is a huge codebase to be concerned about, and if the 
mechanism comes out as simple as i think, it would be trivial to 
implement it on any platform supporting XML.

> [sorry for reversing the two footnotes.] It is a departure from current 
> specifications, but it does provide better support for aggregating 
> documents and maintaining id values. The reason I bring it up, is that 
> such approach also desperately needs a different fragment identifier 
> method as well, since fragments identifiers for a document may be unique 
> only for a specific IDENT path.

fragment identifiers have to be unique by design, otherwise they should 
not be called identifiers, right? ;-)

> In that sense I think you're suggestion 
> complements this proposal quite nicely. What I imagine is having three 
> ways to specify a fragment relative to and idref: 1) by a path of 
> sibling indices; 2) by a path of named IDENTs or 3) by a hybrid 
> combination of the two where one or the other method could follow for 
> each successive descendant from the rooted idref (#id).

that's similar to what i suggested, but it did not include the IDENT 
approach. personally, i don't like the idea of IDENT, but we'll see how 
they are doing anyway...

>> aha, [1] and [2] were in the wrong order. well, the clippings talk 
>> about changing the document, whereas fragment identifiers do not add 
>> any document markup, they simply define fragment identification 
>> semantics.
> Agreed. That's definitely an important distinction between the two 
> approaches, however again I think they're complementary. Adding 
> bookmarks to a document provides more IDREFs to which to root your 
> proposed fragment identifiers. Also, they both address similar use cases 
> — differing only in the one is meant only for those with write access 
> and the other is for anyone with read access, but is more fragile to 
> edits of the document.

it's just a question of use cases. i think in terms of document accesses 
happening worldwide, the number of accesses on read-only documents 
absolutely dwarf the number of accesses to writable documents. and for 
writable documents, HTML4 fragment identifiers already are sufficient, 
because you can add an @id and you're set (as long as you are happoy 
with the fact that fragments can only be elements).

> The wiki is official and has been used by the WG since its inception. 
> There has been some groping as the WG tries to find the best way to 
> address issues in HTML5 and the HTML5 draft. Bugzilla is the latest 
> suggestion for a new way to track issues. No one from the HTML WG has 
> yet started using it (you would be the first) and I'm not sure it 
> provides any benefit over the wiki. BTW, I was wrong about needing to do 
> your own TOC / qucklinks. Adding the syntax [[TableOfContents]] to a 
> document automatically generates a table of contents on the wiki.

so i'll go for bugzilla, if both are official. i don't believe in wikis 
and they often turn out to be write-only spaces. i have had good 
experiences with bugzilla in other w3c activities, and i like the fact 
that issues have an id and can be tracked and assigned and followed.

cheers,

dret.
Received on Thursday, 12 June 2008 23:34:29 UTC