Re: ISSUE-47, Markup support for bookmark and clipping support of documents from Robert J Burns on 2008-06-02 (public-html@w3.org from June 2008)

From: Robert J Burns <rob@robburns.com>
Date: Mon, 2 Jun 2008 12:17:27 +0000
To: Andrew Sidwell <w3c@andrewsidwell.co.uk>
Cc: HTML Issue Tracking WG <public-html@w3.org>
Message-Id: <2057CA7D-1B29-40F0-882E-31687AF7DE15@robburns.com>
Hi Andrew,

On Jun 2, 2008, at 12:18 AM, Andrew Sidwell wrote:

>
> Robert J Burns wrote:
>> HI Andrew,
>> On May 28, 2008, at 8:16 PM, Andrew Sidwell wrote:
>>>
>>> I thought I'd start the discussion on this issue by quoting the  
>>> use-cases presented in the wiki article and discussing them a bit.
>>>
>>>> * Authors and users may often want to specify a single point in a
>>>> document or express a clipping of a document that cannot be easily
>>> > expressed as a well-formed document fragment (i.e., its  
>>> beginning and
>>> > ending cannot not be marked by properly nested start and end  
>>> tags). In
>>> > other document formats, these marks are often referred to  
>>> ‘bookmarks’
>>> > though in a different sense than the URL bookmarks often  
>>> associated
>>> > with HTML. Other document formats also support arbitrary clipmarks
>>> > with a start and end point. Authors and users both make use of  
>>> these
>>> > bookmarks and clipmarks in other formats.
>>>
>>> This use-case seems to be two use cases (please correct me if I'm  
>>> wrong):
>>> - Authors want to specify a single point in a document
>>> - Authors want to mark a section of a document which falls outside  
>>> the document hierarchy for linking or navigating to
>> That's a fair differentiation. We can call that two use cases. In  
>> some ways the entire issue could be separated into two separate  
>> issues, but I though they were closely related enough to combine  
>> them.
>>> I assume the first is met with the id="" attribute.
>> The first could be met with the id attribute, but only on a void  
>> element or an otherwise empty element. I've made a change to the  
>> wiki in response to your question, but the default presentation for  
>> this proposed bookmark element is 'none' so it would need to be an  
>> id attribute on an empty element whose display CSS property was set  
>> to 'none'. That's why I propose adding this new void element with  
>> that default presentation.
>
> <p>Lorem ipsum <a id="page-34"></a> dorem isum it</p>

or <p>Lorem ipsum <span id="page-34"></span> dorem isum it</p>
or <p>Lorem ipsum <em id="page-34"></em> dorem isum it</p>
and so on...

Yes, just about any element would work here (given an author provided  
stylesheet).

The point is that if authors have a need to do this, then why not  
provide a dedicated mechanism for it? As many have said on this list  
before. All of this could be done with DIV elements and the class  
attribute. So what? The point is that we should be providing semantic  
facilities to round out the expressive needs of authors.

> <a id=""> seems like it would do just as well to be, so I don't see  
> the need to introduce a new element.  Actually, <a name=""></a> is  
> the precedent here; that's been used for ages to indicate a position  
> in the document to link to, so I don't see that adding a new element  
> gets us anywhere.

Agree, but what authors need there is a void element. Why not  
deprecate the HR and the BR elements and just tell authors to use an A  
element with appropriate CSS styling?

>
>
>>> The second, well, I think it's slightly mad to want to use a  
>>> hierarchical markup language to express something non- 
>>> hierarchical; the markup will always be a mess and I have no idea  
>>> how you'd present it given that the styling language of the Web  
>>> relies on hierarchy.
>> The data that is being handled is thoroughly hierarchical. However,  
>> even throughly hierarchical data has non-hierarchical aspects or  
>> even incompatible hierarchical aspects for the same data. This clip  
>> capability allows authors and users to express that.
>>>> * A user wants to markup their own copy of a document with  
>>>> important
>>>> bookmarks or passages.
>>>
>>> Seems like <m> and id="" would be the already-proposed devices to  
>>> use here-- <m> to markup passages, id for bookmarks.
>> Only if the clip in the document obeyed precise hierarchical rules.  
>> For example imagine the mark of
>> <p>some text ... some more interesting text </p>
>> <p>interesting text continued... and some other text</p>
>> Using M (mark) with only id would not work. However, using M (mark)  
>> with clip would:
>> <p>some text ... <m clip='interesting' >some more interesting text</ 
>> m> </p>
>> <p><m clip='interesting' >interesting text continued</m>... and  
>> some other text</p>
>
> It would work fine -- you would mark up with <m></m> inside the  
> paragraphs and you'd have the interesting bits marked up.

Yes that would work, but it adds needless complexity for authors and  
users who don't want to risk turning the document into an invalid or  
otherwise non-conforming document. The use of a void BOOKMARK element  
and the clip attribute make this much easier for authors. And again  
this proposal requires no UA implementation (though some would be nice).

>
>
> I guess I'm just wondering what anyone would do with these clips  
> having marked them up.  It's all very well doing that, but there's  
> no way to style it or manipulate such text usefully, so allowing  
> people to do this seems a bit pointless.

But there could be ways of styling this. There could also be ways of  
extracting document fragments from clippings (there already are). See  
the CSS change bar styling that seeks to address styling for a very  
similar semantic (though one not provided by HTML).


>
>
>
>>> > * An author of archival document wants to insert explicit page  
>>> breaks
>>> > where pages may break at arbitrary points in a text.
>>>
>>> This seems like a very different kind of use-case than the others:
>>> - if you are archiving paper documents (like e.g. JSTOR.org), then  
>>> you want to use a format designed for archiving paper documents,  
>>> not a hypertextual format with an explicit lack of support for  
>>> paper documents
>>> - if you are archiving non-paper documents, then page breaks are  
>>> not something you're worrying about, since there are none to  
>>> archive.
>> A book is largely a hierarchical data: parts, chapters, sections,  
>> subsections, paragraphs, sentences, phrases. However, for archival  
>> purposes it may often be important to mark the pages since they may  
>> take on significance (as in citing books by page number or line  
>> number). Everything about the book is suitable for publication as  
>> HTML however, the users and the authors (archivists) want to  
>> include information about the pages. Also, HTML is very well suited  
>> for media independence. So a document may be viewed on screen  
>> without any page presentation while printed on paper it can be  
>> printed precisely to the word as the author/archivist intends.
>
> OK.
>
>>> If the use case is more generally that people want to prepare  
>>> documents for print using HTML, I can imagine that being  
>>> considerably more common, but it isn't related to the other use- 
>>> cases here.  It should be considered separately, perhaps with  
>>> reference to what, e.g. Prince XML does to allow this  
>>> functionality in HTML.
>> To me it seems much easier to introduce a bookmark element with a  
>> clip attribute than to encourage users and authors of HTML to turn  
>> to an entirely new XML vocabulary. For the XML serialization  
>> implementors do not even have to do anything for this proposal: it  
>> just works. Even for legacy HTML it already works in IE (WebKit,  
>> Presto and Mozilla would need to update their parsing dictionaries;  
>> how hard could that be). So asking authors who are already familiar  
>> with HTML to learn an entirely new vocabulary to accomplish this  
>> seems to again forget about the priority of constituencies.
>
> That isn't what I said; I said "with reference to what Prince XML  
> does to allow this functionality in HTML".  Prince is an XML/HTML to  
> PDF processor, and has functionality to allow this, but it's just  
> its support of CSS.  (I couldn't remember if that was the case when  
> I posted before.)

I still don't understand what you're saying here then. If Prince  
recognized a use case for this, then why doesn't it apply to HTML?

Take care,
Rob
Received on Monday, 2 June 2008 12:18:14 UTC