- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Tue, 4 Aug 2009 14:59:00 +0300
On Aug 3, 2009, at 05:45, Ian Hickson wrote: > On Thu, 23 Jul 2009, Keryx Web wrote: >> No suggested text, but a rewrite will be necessary if quotation >> marks becomes >> a conformance criterion. > > Instead of preventing anyone from not using quote marks, I would > instead > recommend asking your validator vendor to offer you an option to > require > quote marks and warn you when you have forgotten them. There's a usability cost and a QA cost to adding optional features to a validator, which is why I try to resist requests to add more configuration and optional features to Validator.nu. I've gotten requests to add checks inspired by XHTML. These requests generally aren't about true polyglot checking (since few people know how long the actual polyglot checking corner case list is). Also, these requests aren't about code style in general. (Well, actually, Anne asked for indent style checking on Sam's blog comments and another commenter thought Anne was making fun of the quote issue...) Since the requests happen to be about the most prominent syntactic features of XML as opposed to being across the board about code style, I suspect part of the requests is about unease about letting go of extra requirements taught as part of XHTML-as-text/html evangelism. When adding optional warnings to Validator.nu, I'd like to tell apart actual problems from unease of letting go of XHTML-as-text/html before proceeding. (I expect "actual problems" to be with us always, but I expect the unease to pass with a little time.) The top 4 requests are: * Flagging unquoted attributes. * Flagging implied tags. * Flagging non-lowercase element and attribute names. * Flagging inconsistent use of /> on void elements. I think the implied end tags are different from the rest, and I think an option to flag implied tags would be a useful feature to have. I want to implement it, but I have some higher-priority Gecko items on my plate first. Implied tags is different from the rest, because tag inference doesn't necessarily work like authors expect, so automatically generating the tags might not do the right thing. OTOH, the other cases can be safely fixed automatically. (Except some quoting issues; more on that later.) In fact, indent style is also something that could be made consistent automatically by an HTML-aware text editor. When issues are code style issues in nature and don't need human intervention to change to a particular style, I think it's more useful to have an editor that simply reformats code than to have a validator that flags failure to comply to code style without performing the reformatting. Consider Eclipse JDT: If you have bad indents, you don't get warning or error markers in the margin. Instead, you can ask Eclipse to reformat code according to a wide variety of settings. This creates a mild issue: If different people collaborate and have different code formatter settings, having another person's editor reformat code creates some source control issues. However, you don't get error messages, so you are still avoiding the problem that I raised earlier on public-html and the Maciej mentioned here about tool interop (so that you can swap tools without getting a huge bunch of errors). Also note that we can't really eliminate this source control re- indenting issue on the spec level. There's no way we could get everyone to agree on One True HTML indent style. And as long as there isn't One True indent style consistently applied everywhere, I think it doesn't matter much if other syntax is used consistently. Now, people are going to say that it's good to use /> consistently, because it helps you see which elements are void elements. It doesn't work that way, though. Because /> has no effect on HTML elements, you still need to *know* which elements are void elements, and pretending that /> means something poisons the mental model people have and is actually bad for teaching. (People write <div class="foo"/> or <script src="..."/> having heard that /> closes the element.) Unfortunately, we need to keep /> conforming to make it easy to upgrade XHTML-as-text/ html-emitting systems to HTML5. I think it would be rather arbitrary to add a feature for checking the *consistent* use of <br> vs. <br/>. Why not <br/> vs. <br />? foo='bar' vs. foo="bar"? Or indent style? As for lower-case names, I don't think people *really* want lower-case names. I think, as a matter of code style, they want *canonical-case* names, which aren't all lower-case for SVG-in-text/html and MathML-in- text/html (definitionURL). I think adding checking for this would have disproportionate ill impact on the parser code base compared to benefit. In a reasonable general-purpose HTML parser implementation, the case information is lost before it is decided if a tag belongs to an HTML, SVG or MathML element and maintaining a special-purpose parser for validation wouldn't be good. On the benefit side, I don't think accidentally holding down the shift key when typing a name is a notable practical authoring problem. Due to this disparity in benefit and code complexity badness, I'm not planning on implementing this check. Back to the unquoted attributes request. I think it's the hardest one of the four to decide whether to implement or not. I think it is easy to decide unquoted attributes shouldn't be errors, and it's easy to decide that if the feature were available, it should be optional. (Making it mandatory would annoy people updating existing sites using quote omission and people who know just fine that they can omit quotes on stuff like type=radio and don't want to type extra.) There's one case that clearly needs an unconditional warning, though: <foo bar=baz/> when the format of the value of bar doesn't exclude a trailing slash. In this case, the /> feature interacts badly with the quote omission feature. I think a good way to proceed here is to write more complex code for detecting <foo bar=baz/> first and seeing if it together with more precise datatyping than in old DTD-based validators is enough to catch actual problems without introducing more UI options. If after that change users of Validator.nu still face uncaught problems due to quote omission (e.g. class or alt eating up whatever follows and somehow managing not to generate any subsequent error), I think exploring a feature for optionally warning about unquoted attributes would make sense. > This would address your use case, as far as I can tell, without > preventing > anyone who _likes_ omitting quote marks from doing so. [...] > Omitting quotes would also make a large number of pages invalid for > more > or less stylistic reasons, which would make it harder for people to > transition to HTML5, and may annoy them ("Why do I have to add these > quotes, they don't really add anything -- bah! I hate html5"). I think that quote omission should stay conforming for these reasons. > (Tools, of course, can just quote everything. There's no reason > other than > user preference for the authoring tool to not quote values, as far > as I > can tell.) I encourage anyone who is writing an HTML serializer to use double quotes for attributes unconditionally, unless there's a specific need to optimize file size to the point of counting bytes. (Single quotes are worse, because developers are tempted to escape ' as ' in attribute value, but ' has compat issues with IE versions still out there.) > On Sat, 25 Jul 2009, Keryx Web wrote: >> >> Consider this PHP template: >> >> <input type=text value=$login name=login> >> >> Value is the suggested text, if no user data is available it says >> "login". >> Otherwise its the users login name (no spaces allowed). All is well. >> >> One day a developer decides that "login name" is a better value, >> and hard >> codes it into the PHP business logic, producing this HTML: >> >> <input type=text value=login name name=login> >> >> All of a sudden you *effectively* have produced this: >> >> <input type=text value=login name=""> >> >> And it stops working. > > I agree that this is an issue, and I would strongly recommend that > people > who write templates not make assumptions about the values they are > inserting. If you aren't manually typing both the attribute name and the value in a text editor, you should always use double quotes for generated values to avoid trouble. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Tuesday, 4 August 2009 04:59:00 UTC