- From: <bugzilla@wiggum.w3.org>
- Date: Sun, 05 Apr 2009 02:16:34 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=6774 Summary: <mark> element: restrict insertion by other servers Product: HTML WG Version: unspecified Platform: All URL: http://www.w3.org/TR/html5/single-page/ OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: HTML 5: The Markup Language AssignedTo: mike@w3.org ReportedBy: Nick_Levinson@yahoo.com QAContact: public-html-bugzilla@w3.org CC: public-html@w3.org I understand the mark element is intended to be under a website owner's control, and that I misunderstood the draft standard as proposing that a server not under the website owner's control could insert the element in response to a user's apparent interests. That's good. The HTML5 standard should generally conserve the website owner's rights. Otherwise, the mark element could allow security breaches. However, the HTML5 draft standard's language seems to be somewhat ambiguous. Section 4.6.7 says, "When used in the main prose of a document, it [the mark element] indicates a part of the document that has been highlighted due to its likely relevance to the user's current activity." That suggests knowledge of the user's current activity unknown to the page author. Either the author has to anticipate a number of uses and insert mark elements for all of them or someone else is to insert a mark element. The examples in the draft are two that are perfectly safe and one that's dangerous. The safe ones are presentational. If quoting a source and wishing to add a quoter's indication of relevance other than by the traditional means (found in books, legal briefs, etc.) of adding italicization, the mark element is fine. If, when offering one's own prose, one wants a method to supplement the strong and em elements, mark is fine. Both are safe because they're in the control of the page author. Even when quoting, the page author or the website owner is at least vaguely known, and the mark is reasonably attributable to them and reasonably likely to be attributed to them even by nonexperts. The main problem is with third-party insertion by an unidentified party and use by a user with only basic computer skills who wrongly but naively assumes the website owner did the mark element's resulting markup. That user won't even know about the mark element or how to access source code, and may not be allowed to access source code because browser commands are dimmed by an institution. Example: Someone runs a small business and thinks the managers are getting hung up on legalities. So they set their browser or server to apply the mark element to copyright notices, terms of use, and other bothersome stuff, and they style the mark element to be in a one-point font in white text on a white background. If anyone asks, that's just the way the website is. If the staff call the website company and debate whether the website does or does not say "x", the staff will be wrong but never know it, and if the staff are lawyers or IT managers, for example, they may commit major violations of contract or other law and never know why. Even highly skilled computer users review source code on no more than one percent of all the pages they rely on, and that won't change once third-party <mark> insertions begin to change the look of websites. The search engine problem is a good one. I often run a search, get a result, see the snippet, go to the page, and wonder where on the page my search terms are. A browser's Find function can be inefficient. However, I would prefer if browsers would offer a feature whereby search terms can be copied from a search engine URL and then the page auto-scrolled to their location. This could be a UA-specific implementation that could be based on agreements with search engine firms. It does not need a W3C or HTML standard. For example, the Opera 9.52 browser has a search facility that allows me to execute a search using Google, Ask, Yahoo, Amazon, Wikipedia, eBay, Yahoo Shopping, or BitTorrent, and presumably terms can be retained long enough to support a user-dismissable Opera frame pointing to their occurrence or to feed Opera's Find function. I assume <mark> can be styled with anything available in CSS for other text elements, such as <a> and <span>. I haven't even considered the extent to which stylistic creativity can change the meaning of all sorts of marked content. This only considers the host's and user's servers. It doesn't consider servers in between. The standard should not give permission to third-party server owners to insert and style <mark> as they wish. The draft, as it stands now, would. An earlier commenter elsewhere (Lachlan Hunt), in response to my concern that "[i]f the <mark> element is intended to be introduceable by servers other than the website owner's, then that should be preventable", said "No, this is a misunderstanding of the mark element's purpose. If a 3rd party server can inject markup into another site's content, then that's a major security problem, but it is independent from HTML itself. It is also not how the mark element is intended to be used." I replied: "HTML5's role in a security breech would come if it grants permission to system designers, as I saw in this statement: 'Another example of the mark element is highlighting parts of a document that are matching some search string. If someone looked at a document, and the server knew that the user was searching for the word "kitten", then the server might return the document with one paragraph modified as follows: . . . . <mark>kitten</mark> . . . .' Section 4.6.7. That looks like permission for the server to interject markup into a byte stream. Given that many people in large organizations view outside websites in a way that involves at least two servers per visit, one hosting and others not, the section seems to be permission for any nonhost server to sell advertising or comment on content as if it's the author's commentary. Thus, the security breech would be furthered by HTML as permission. However, as I didn't find any reference in the document to any server that wasn't acting on a served document somehow as authorized, e.g., by checking a certificate, if you're right that the intent was not as I feared, then we should propose rewording the HTML standard before finalization so only the site owner's server might mark the string if nonowners are to be conformant. . . . I'm not an attorney and laws vary by nation and circumstance, but if you believe there's any error in the above please let us know." Could you please tighten the language to leave the mark element's use in the hands of the page author? If that can't be done, can restrictions to prevent security breaches be written in? The problem with that, of course, are the malicious attackers. At the least, allow page authors to block insertion of a mark element not already in the source code. For example, a meta element in the head element might be a preventive for a page. Example: <meta addmark="false"> The True value would be available but trivial, as omitting the meta element would also imply True. Yes/No would be clearer but inconsistent with practice with other meta elements. A narrower problem is whether a website that supports internal searches might want to allow their own host to insert the mark element in response to a local user's search executed locally even though the addmark is turned off for everywhere else. This potentially applies to any CGI script and perhaps other locally-applied technology. To solve that, a second meta attribute could serve. Example: <meta addmark="false" addmarklocal="true"> The website designer could decide who qualifies as local and how to implement that decision technically, and could use the attribute to prevent anyone deemed nonlocal from marking content on that page. The False value would be available but trivial, as omitting the addmarklocal attribute would also imply its falsity when addmark is False. The order of the attributes should not matter. Placing the attributes in one or two separate meta tags should not matter. The earlier discussion is in Bug 6606. This responds to <http://www.w3.org/TR/html5/single-page/>, accessed 4-4-09, Working Draft, 12 February 2009 (presumably <http://www.w3.org/TR/2009/WD-html5-20090212/>), section 4.6.7. I'll await possible comment here before considering whether to propose the meta attributes in the appropriate Wiki. Thank you. -- Nick -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Sunday, 5 April 2009 02:16:45 UTC