W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > June 2009

[Bug 6774] <mark> element: restrict insertion by other servers

From: <bugzilla@wiggum.w3.org>
Date: Sun, 21 Jun 2009 19:59:35 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1MITCR-0006i7-S3@wiggum.w3.org>

--- Comment #8 from Nick Levinson <Nick_Levinson@yahoo.com>  2009-06-21 19:59:34 ---
Your blog example is fine. So are some of the intentions behind mark, so far as
discussed. So is the desire for a tag that has a clearer name than <span>. So
is the desire not to use <b>, <em>, etc. in mixed ways in your pages.

It's possible to have both classless and classed elements properly styled; I
did it in IE5.5 (b & b.test with different colors), so I assume it can be done
with other elements and browsers, but I understand the convenience of having a
separate element for certain purposes rather than classes for common elements,
and mark is fine for that.

The blog example is fine because, from the viewpoint of page authoring, you're
creating an original HTML page whether you invented every word or quoted Jeanne
d'Arc, and so you could be inserting markup anyway. (What's verbatim in various
emphases is not an HTML issue.) You, as the blog owner, would essentially be
adding markup to your own page. That's fine. Using mark to help you is fine.

The problem is shown in the kitten example in the HTML5 draft, section 4.6.7.
It says, "Another example of the mark element is highlighting parts of a
document that are matching some search string. If someone looked at a document,
and the server knew that the user was searching for the word 'kitten', then the
server might return the document with one paragraph modified as follows:
[example includes <mark>kitten</mark> twice in running text]."

Search strings come from two places: inside and outside a website. Websites
that offer their own search functions produce internal search strings, and
applying that search string to the text in a document on that same website is
essentially doing as the website owner intended, the whole process being within
the same website. That's fine.

But external search strings present a problem. A search in Bing or another
search engine produces a search string that matches a string in a page. HTML5
proposes that the search string be used to highlight a matching string on a
page by adding a mark element. The HTML5 proposal applies to a search string
whether it is internal or external to the host server. Under HTML5, any server
may apply the mark tag, not just the website owner's hosting server.

The reason the external search string is a problem is that its acceptability
for determining a user's interest in kittens or anything else means that any
external string can justify a server adding the mark element to an internal
string. There's no way to do that except by servers not under the control of
the website owner. HTML5 would give permission to do that. So Microsoft's Bing
search engine could return a list of links and then through frames (as some
major search engines do now in some modes) present the actual pages with mark
tags added. Granted they could add markup anyway; they already do sometimes,
but so far, to my knowledge, they state an explanation at the top, putting even
newbie users on notice without requiring them to remember the meaning of
double-underlines or other abstract symbols. And they do it by copying the
page, reformatting it to their style, and then marking it up with an
explanation. For that, they don't need HTML permission. HTML5 would add
permission to all server owners, and, given Microsoft's past history with their
browsers, someone big's likely to take advantage. And not just someone big.

Intentions, I've been assured, wouldn't allow this. Our good intentions would
suffice except for one thing: Browser designers are not going to ask us or look
at the bug report, and they don't have to. The specification will be all
they'll need. Students of programming will almost never see intentions. They'll
see standards. If the plain words of the standards grant permission, the
behind-the-scenes discussion about what was really meant will be out of sight
and ignored.

So far, no one's justified third-party markup as if on the original, other than
to say it's being done already, and I don't think they meant that it's a good
idea. Some who disagree on that point agree that users should be able to tell
when markup was added by someone else besides the original site owner. If a
newbie can tell the difference in ownership, I'm willing to accept third-party
markup at the user's display. So those of us in this discussion essentially
agree that third-party markup as if it's in the original should not be allowed.
We've differed on whether the standard allows it, not on whether it should. And
that's solvable.

I think all that's needed is to change "server" to "website owner's server" and
permission for other servers to add markup would be gone.

Another solution would be to define "server" in a more global context so that
in this section it could only mean the website owner's server. I didn't see



Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Sunday, 21 June 2009 19:59:44 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 16:30:37 UTC