[Bug 6774] New: <mark> element: restrict insertion by other servers

http://www.w3.org/Bugs/Public/show_bug.cgi?id=6774

           Summary: <mark> element: restrict insertion by other servers
           Product: HTML WG
           Version: unspecified
          Platform: All
               URL: http://www.w3.org/TR/html5/single-page/
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: HTML 5: The Markup Language
        AssignedTo: mike@w3.org
        ReportedBy: Nick_Levinson@yahoo.com
         QAContact: public-html-bugzilla@w3.org
                CC: public-html@w3.org


I understand the mark element is intended to be under a website owner's
control, and that I misunderstood the draft standard as proposing that a server
not under the website owner's control could insert the element in response to a
user's apparent interests.

That's good. The HTML5 standard should generally conserve the website owner's
rights. Otherwise, the mark element could allow security breaches.

However, the HTML5 draft standard's language seems to be somewhat ambiguous.
Section 4.6.7 says, "When used in the main prose of a document, it [the mark
element] indicates a part of the document that has been highlighted due to its
likely relevance to the user's current activity." That suggests knowledge of
the user's current activity unknown to the page author. Either the author has
to anticipate a number of uses and insert mark elements for all of them or
someone else is to insert a mark element.

The examples in the draft are two that are perfectly safe and one that's
dangerous.

The safe ones are presentational. If quoting a source and wishing to add a
quoter's indication of relevance other than by the traditional means (found in
books, legal briefs, etc.) of adding italicization, the mark element is fine.
If, when offering one's own prose, one wants a method to supplement the strong
and em elements, mark is fine. Both are safe because they're in the control of
the page author. Even when quoting, the page author or the website owner is at
least vaguely known, and the mark is reasonably attributable to them and
reasonably likely to be attributed to them even by nonexperts.

The main problem is with third-party insertion by an unidentified party and use
by a user with only basic computer skills who wrongly but naively assumes the
website owner did the mark element's resulting markup. That user won't even
know about the mark element or how to access source code, and may not be
allowed to access source code because browser commands are dimmed by an
institution.

Example: Someone runs a small business and thinks the managers are getting hung
up on legalities. So they set their browser or server to apply the mark element
to copyright notices, terms of use, and other bothersome stuff, and they style
the mark element to be in a one-point font in white text on a white background.
If anyone asks, that's just the way the website is. If the staff call the
website company and debate whether the website does or does not say "x", the
staff will be wrong but never know it, and if the staff are lawyers or IT
managers, for example, they may commit major violations of contract or other
law and never know why.

Even highly skilled computer users review source code on no more than one
percent of all the pages they rely on, and that won't change once third-party
<mark> insertions begin to change the look of websites.

The search engine problem is a good one. I often run a search, get a result,
see the snippet, go to the page, and wonder where on the page my search terms
are. A browser's Find function can be inefficient. However, I would prefer if
browsers would offer a feature whereby search terms can be copied from a search
engine URL and then the page auto-scrolled to their location. This could be a
UA-specific implementation that could be based on agreements with search engine
firms. It does not need a W3C or HTML standard. For example, the Opera 9.52
browser has a search facility that allows me to execute a search using Google,
Ask, Yahoo, Amazon, Wikipedia, eBay, Yahoo Shopping, or BitTorrent, and
presumably terms can be retained long enough to support a user-dismissable
Opera frame pointing to their occurrence or to feed Opera's Find function.

I assume <mark> can be styled with anything available in CSS for other text
elements, such as <a> and <span>. I haven't even considered the extent to which
stylistic creativity can change the meaning of all sorts of marked content.

This only considers the host's and user's servers. It doesn't consider servers
in between. The standard should not give permission to third-party server
owners to insert and style <mark> as they wish. The draft, as it stands now,
would.

An earlier commenter elsewhere (Lachlan Hunt), in response to my concern that
"[i]f the <mark> element is intended to be introduceable by servers other than
the website owner's, then that should be preventable", said "No, this is a
misunderstanding of the mark element's purpose.  If a 3rd party server can
inject markup into another site's content, then that's a major security
problem, but it is independent from HTML itself.  It is also not how the mark
element is intended to be used." I replied: "HTML5's role in a security breech
would come if it grants permission to system designers, as I saw in this
statement: 'Another example of the mark element is highlighting parts of a
document that are matching some search string. If someone looked at a document,
and the server knew that the user was searching for the word "kitten", then the
server might return the document with one paragraph modified as follows: . . .
. <mark>kitten</mark> . . . .' Section 4.6.7. That looks like permission for
the server to interject markup into a byte stream. Given that many people in
large organizations view outside websites in a way that involves at least two
servers per visit, one hosting and others not, the section seems to be
permission for any nonhost server to sell advertising or comment on content as
if it's the author's commentary. Thus, the security breech would be furthered
by HTML as permission. However, as I didn't find any reference in the document
to any server that wasn't acting on a served document somehow as authorized,
e.g., by checking a certificate, if you're right that the intent was not as I
feared, then we should propose rewording the HTML standard before finalization
so only the site owner's server might mark the string if nonowners are to be
conformant. . . . I'm not an attorney and laws vary by nation and circumstance,
but if you believe there's any error in the above please let us know."

Could you please tighten the language to leave the mark element's use in the
hands of the page author?

If that can't be done, can restrictions to prevent security breaches be written
in? The problem with that, of course, are the malicious attackers.

At the least, allow page authors to block insertion of a mark element not
already in the source code. For example, a meta element in the head element
might be a preventive for a page. Example:

<meta addmark="false">

The True value would be available but trivial, as omitting the meta element
would also imply True. Yes/No would be clearer but inconsistent with practice
with other meta elements.

A narrower problem is whether a website that supports internal searches might
want to allow their own host to insert the mark element in response to a local
user's search executed locally even though the addmark is turned off for
everywhere else. This potentially applies to any CGI script and perhaps other
locally-applied technology. To solve that, a second meta attribute could serve.
Example:

<meta addmark="false" addmarklocal="true">

The website designer could decide who qualifies as local and how to implement
that decision technically, and could use the attribute to prevent anyone deemed
nonlocal from marking content on that page. The False value would be available
but trivial, as omitting the addmarklocal attribute would also imply its
falsity when addmark is False. The order of the attributes should not matter.
Placing the attributes in one or two separate meta tags should not matter.

The earlier discussion is in Bug 6606. This responds to
<http://www.w3.org/TR/html5/single-page/>, accessed 4-4-09, Working Draft, 12
February 2009 (presumably <http://www.w3.org/TR/2009/WD-html5-20090212/>),
section 4.6.7. I'll await possible comment here before considering whether to
propose the meta attributes in the appropriate Wiki.

Thank you.

-- 
Nick


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Sunday, 5 April 2009 02:16:46 UTC