Re: ISSUE-47, Markup support for bookmark and clipping support of documents from Robert J Burns on 2008-05-28 (public-html@w3.org from May 2008)

From: Robert J Burns <rob@robburns.com>
Date: Wed, 28 May 2008 22:32:42 +0000
To: Andrew Sidwell <w3c@andrewsidwell.co.uk>
Cc: HTML Issue Tracking WG <public-html@w3.org>
Message-Id: <B56ABB3A-698D-4DA0-92F9-4A676F7F7AB4@robburns.com>
HI Andrew,


On May 28, 2008, at 8:16 PM, Andrew Sidwell wrote:

>
> I thought I'd start the discussion on this issue by quoting the use- 
> cases presented in the wiki article and discussing them a bit.
>
>> * Authors and users may often want to specify a single point in a
>> document or express a clipping of a document that cannot be easily
> > expressed as a well-formed document fragment (i.e., its beginning  
> and
> > ending cannot not be marked by properly nested start and end  
> tags). In
> > other document formats, these marks are often referred to  
> ‘bookmarks’
> > though in a different sense than the URL bookmarks often associated
> > with HTML. Other document formats also support arbitrary clipmarks
> > with a start and end point. Authors and users both make use of these
> > bookmarks and clipmarks in other formats.
>
> This use-case seems to be two use cases (please correct me if I'm  
> wrong):
> - Authors want to specify a single point in a document
> - Authors want to mark a section of a document which falls outside  
> the document hierarchy for linking or navigating to

That's a fair differentiation. We can call that two use cases. In some  
ways the entire issue could be separated into two separate issues, but  
I though they were closely related enough to combine them.

> I assume the first is met with the id="" attribute.

The first could be met with the id attribute, but only on a void  
element or an otherwise empty element. I've made a change to the wiki  
in response to your question, but the default presentation for this  
proposed bookmark element is 'none' so it would need to be an id  
attribute on an empty element whose display CSS property was set to  
'none'. That's why I propose adding this new void element with that  
default presentation.

> The second, well, I think it's slightly mad to want to use a  
> hierarchical markup language to express something non-hierarchical;  
> the markup will always be a mess and I have no idea how you'd  
> present it given that the styling language of the Web relies on  
> hierarchy.

The data that is being handled is thoroughly hierarchical. However,  
even throughly hierarchical data has non-hierarchical aspects or even  
incompatible hierarchical aspects for the same data. This clip  
capability allows authors and users to express that.

>> * A user wants to markup their own copy of a document with important
>> bookmarks or passages.
>
> Seems like <m> and id="" would be the already-proposed devices to  
> use here-- <m> to markup passages, id for bookmarks.

Only if the clip in the document obeyed precise hierarchical rules.  
For example imagine the mark of

<p>some text ... some more interesting text </p>
<p>interesting text continued... and some other text</p>

Using M (mark) with only id would not work. However, using M (mark)  
with clip would:

<p>some text ... <m clip='interesting' >some more interesting text</m>  
</p>
<p><m clip='interesting' >interesting text continued</m>... and some  
other text</p>


>
>
>
> > * An author of archival document wants to insert explicit page  
> breaks
> > where pages may break at arbitrary points in a text.
>
> This seems like a very different kind of use-case than the others:
> - if you are archiving paper documents (like e.g. JSTOR.org), then  
> you want to use a format designed for archiving paper documents, not  
> a hypertextual format with an explicit lack of support for paper  
> documents
> - if you are archiving non-paper documents, then page breaks are not  
> something you're worrying about, since there are none to archive.

A book is largely a hierarchical data: parts, chapters, sections,  
subsections, paragraphs, sentences, phrases. However, for archival  
purposes it may often be important to mark the pages since they may  
take on significance (as in citing books by page number or line  
number). Everything about the book is suitable for publication as HTML  
however, the users and the authors (archivists) want to include  
information about the pages. Also, HTML is very well suited for media  
independence. So a document may be viewed on screen without any page  
presentation while printed on paper it can be printed precisely to the  
word as the author/archivist intends.


> If the use case is more generally that people want to prepare  
> documents for print using HTML, I can imagine that being  
> considerably more common, but it isn't related to the other use- 
> cases here.  It should be considered separately, perhaps with  
> reference to what, e.g. Prince XML does to allow this functionality  
> in HTML.

To me it seems much easier to introduce a bookmark element with a clip  
attribute than to encourage users and authors of HTML to turn to an  
entirely new XML vocabulary. For the XML serialization implementors do  
not even have to do anything for this proposal: it just works. Even  
for legacy HTML it already works in IE (WebKit, Presto and Mozilla  
would need to update their parsing dictionaries; how hard could that  
be). So asking authors who are already familiar with HTML to learn an  
entirely new vocabulary to accomplish this seems to again forget about  
the priority of constituencies.

> > * An author wants to otherwise wrap markup content in a
> > non-hierarchical manner.
>
> I don't think this is a use-case; the use-cases would be the reason  
> why they want to why they want to wrap content in a non-hierarchical  
> manner.

This use-case follows from the previous use case. In other words there  
are aspects of a hierarchical document other than pages that are not  
hierarchical (phrases, passages, etc.).

I hope that clarifies things. I appreciate your feedback and it will  
help us to improve the wiki page  and eventually the HTML5 draft.

Take care,
Rob
Received on Wednesday, 28 May 2008 22:33:44 UTC