Re: Use of Annotea with non-XHTML HTML

At 09:30 PM 5/9/2001 +0200, Eric van der Vlist wrote:
>Matthew Wilson wrote:
>
>> For Annozilla, I try to resolve the XPointer describing the context of the
>> annotation, and I intend to try and construct XPointers when creating
>> annotations. Currently I concentrate on things like
>> xpointer(/html[1]/body[1]/p[3]). This works quite well, but I can only do
>> this  by navigating the DOM.

Use of xpath with HTML (not XML) is an interesting issue.

I wonder what other fora might be discussing this.  I just did
a quick scan of the XML Linking public comments archive
http://lists.w3.org/Archives/Public/www-xml-linking-comments/
and did not find this topic.

>While I understand how useful this would be, this is not conform to the
>RFC 2854 [1] that is the normative reference describing the HTML media
>type (browse for "3. Fragment Identifiers").

I will argue that whether or not xpath syntax is permissible in
fragment identifiers is open for discussion and interpretation.

>This RFC clearly states that the fragment identifier for HTML documents:
>
><quote>
>   designates the correspondingly named element; any element may be
>   named with the "id" attribute, and A, APPLET, FRAME, IFRAME, IMG and
>   MAP elements may be named with a "name" attribute.  This is described
>   in detail in [HTML40] section 12.
></quote>

The debate can hinge on the interpretation of the word "named".

We might be able to easily get agreement that an xpath expression
is a kind of address for an element.  From there we proceed to
the observation that naming and addressing need not be disjoint.
See, e.g. http://www.w3.org/DesignIssues/NameMyth.html

So a coherent -- not to say pragmatic -- case can be made that
an expression such as xpointer(/html[1]/body[1]/p[3]) can indeed
constitute a permissible "name" for an element.

Note, also, that this paragraph doesn't use the RFC keywords
MUST or SHOULD but rather the more permissive "may".

>The syntax "http://foo.xxx/bar.html#xpointer(...)" should therefor not
>been used for HTML documents.

"should not" as in "is not endorsed by any standard and is
therefore not interoperable when employed".

Matthew's client would be entirely conformant if it chose
to ignore fragment identifiers containing ()'s.

>To change this should involved only the XPointer WG, but also the
>authors of the RFC 2854 and the HTML WG which is quoted in this RFC.

If you meant to write "... involve /+not+/ only..." then I agree.

>Since the HTML WG has declared that no further work would be done on
>HTML and that the problem is solved for XHTML, I wonder if these
>specifications are likely to be changed, though.

Tough call.  I think a coherent consensus could be reached that
requires no change to RFC 2854 but that does drop the tight binding
between "fragment identifiers", "anchor identifiers", and "anchor
names" in http://www.w3.org/TR/html4/intro/intro.html#h-2.1.2
and http://www.w3.org/TR/html4/struct/links.html#h-12.2.1

The issue of case sensitivity doesn't worry me too much.
If HTML DOM requires canonicalization of element names to
uppercase, the obvious thing is to apply that canonicalization
to (only) the element names in an xpath as well.  "Only"
because HTML4 says that anchor name comparison is case sensitive
but also that no two anchor names within a single document may
differ only in case. 

Surely this discussion must have occurred elsewhere already.
Anyone have any pointers? (w/ fragments :-)

Meanwhile, does the community here think that Amaya should
disallow selection (highlighting) of element content when
attaching an annotation to a text/html document?  Should it
limit the context to the smallest enclosing element with an
ID or NAME attribute and draw its pencil icon at the start
of that element?

Received on Friday, 11 May 2001 13:34:39 UTC