Re: ISSUE-47, Markup support for bookmark and clipping support of documents from Andrew Sidwell on 2008-06-02 (public-html@w3.org from June 2008)

From: Andrew Sidwell <w3c@andrewsidwell.co.uk>
Date: Mon, 02 Jun 2008 01:18:04 +0100
To: Robert J Burns <rob@robburns.com>
CC: HTML Issue Tracking WG <public-html@w3.org>
Message-ID: <48433C3C.5010808@andrewsidwell.co.uk>
Robert J Burns wrote:
> 
> HI Andrew,
> 
> 
> On May 28, 2008, at 8:16 PM, Andrew Sidwell wrote:
> 
>>
>> I thought I'd start the discussion on this issue by quoting the 
>> use-cases presented in the wiki article and discussing them a bit.
>>
>>> * Authors and users may often want to specify a single point in a
>>> document or express a clipping of a document that cannot be easily
>> > expressed as a well-formed document fragment (i.e., its beginning and
>> > ending cannot not be marked by properly nested start and end tags). In
>> > other document formats, these marks are often referred to ‘bookmarks’
>> > though in a different sense than the URL bookmarks often associated
>> > with HTML. Other document formats also support arbitrary clipmarks
>> > with a start and end point. Authors and users both make use of these
>> > bookmarks and clipmarks in other formats.
>>
>> This use-case seems to be two use cases (please correct me if I'm wrong):
>> - Authors want to specify a single point in a document
>> - Authors want to mark a section of a document which falls outside the 
>> document hierarchy for linking or navigating to
> 
> That's a fair differentiation. We can call that two use cases. In some 
> ways the entire issue could be separated into two separate issues, but I 
> though they were closely related enough to combine them.
> 
>> I assume the first is met with the id="" attribute.
> 
> The first could be met with the id attribute, but only on a void element 
> or an otherwise empty element. I've made a change to the wiki in 
> response to your question, but the default presentation for this 
> proposed bookmark element is 'none' so it would need to be an id 
> attribute on an empty element whose display CSS property was set to 
> 'none'. That's why I propose adding this new void element with that 
> default presentation.

<p>Lorem ipsum <a id="page-34"></a> dorem isum it</p>

<a id=""> seems like it would do just as well to be, so I don't see the 
need to introduce a new element.  Actually, <a name=""></a> is the 
precedent here; that's been used for ages to indicate a position in the 
document to link to, so I don't see that adding a new element gets us 
anywhere.

>> The second, well, I think it's slightly mad to want to use a 
>> hierarchical markup language to express something non-hierarchical; 
>> the markup will always be a mess and I have no idea how you'd present 
>> it given that the styling language of the Web relies on hierarchy.
> 
> The data that is being handled is thoroughly hierarchical. However, even 
> throughly hierarchical data has non-hierarchical aspects or even 
> incompatible hierarchical aspects for the same data. This clip 
> capability allows authors and users to express that.
> 
>>> * A user wants to markup their own copy of a document with important
>>> bookmarks or passages.
>>
>> Seems like <m> and id="" would be the already-proposed devices to use 
>> here-- <m> to markup passages, id for bookmarks.
> 
> Only if the clip in the document obeyed precise hierarchical rules. For 
> example imagine the mark of
> 
> <p>some text ... some more interesting text </p>
> <p>interesting text continued... and some other text</p>
> 
> Using M (mark) with only id would not work. However, using M (mark) with 
> clip would:
> 
> <p>some text ... <m clip='interesting' >some more interesting text</m> </p>
> <p><m clip='interesting' >interesting text continued</m>... and some 
> other text</p>

It would work fine -- you would mark up with <m></m> inside the 
paragraphs and you'd have the interesting bits marked up.

I guess I'm just wondering what anyone would do with these clips having 
marked them up.  It's all very well doing that, but there's no way to 
style it or manipulate such text usefully, so allowing people to do this 
seems a bit pointless.


>> > * An author of archival document wants to insert explicit page breaks
>> > where pages may break at arbitrary points in a text.
>>
>> This seems like a very different kind of use-case than the others:
>> - if you are archiving paper documents (like e.g. JSTOR.org), then you 
>> want to use a format designed for archiving paper documents, not a 
>> hypertextual format with an explicit lack of support for paper documents
>> - if you are archiving non-paper documents, then page breaks are not 
>> something you're worrying about, since there are none to archive.
> 
> A book is largely a hierarchical data: parts, chapters, sections, 
> subsections, paragraphs, sentences, phrases. However, for archival 
> purposes it may often be important to mark the pages since they may take 
> on significance (as in citing books by page number or line number). 
> Everything about the book is suitable for publication as HTML however, 
> the users and the authors (archivists) want to include information about 
> the pages. Also, HTML is very well suited for media independence. So a 
> document may be viewed on screen without any page presentation while 
> printed on paper it can be printed precisely to the word as the 
> author/archivist intends.

OK.

>> If the use case is more generally that people want to prepare 
>> documents for print using HTML, I can imagine that being considerably 
>> more common, but it isn't related to the other use-cases here.  It 
>> should be considered separately, perhaps with reference to what, e.g. 
>> Prince XML does to allow this functionality in HTML.
> 
> To me it seems much easier to introduce a bookmark element with a clip 
> attribute than to encourage users and authors of HTML to turn to an 
> entirely new XML vocabulary. For the XML serialization implementors do 
> not even have to do anything for this proposal: it just works. Even for 
> legacy HTML it already works in IE (WebKit, Presto and Mozilla would 
> need to update their parsing dictionaries; how hard could that be). So 
> asking authors who are already familiar with HTML to learn an entirely 
> new vocabulary to accomplish this seems to again forget about the 
> priority of constituencies.

That isn't what I said; I said "with reference to what Prince XML does 
to allow this functionality in HTML".  Prince is an XML/HTML to PDF 
processor, and has functionality to allow this, but it's just its 
support of CSS.  (I couldn't remember if that was the case when I posted 
before.)

>> > * An author wants to otherwise wrap markup content in a
>> > non-hierarchical manner.
>>
>> I don't think this is a use-case; the use-cases would be the reason 
>> why they want to why they want to wrap content in a non-hierarchical 
>> manner.
> 
> This use-case follows from the previous use case. In other words there 
> are aspects of a hierarchical document other than pages that are not 
> hierarchical (phrases, passages, etc.).
> 
> I hope that clarifies things. I appreciate your feedback and it will 
> help us to improve the wiki page  and eventually the HTML5 draft.
> 
> Take care,
> Rob


Cheers,
A
Received on Monday, 2 June 2008 00:18:40 UTC