- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 2 Apr 2007 00:22:27 +0300
- To: Jirka Kosek <jirka@kosek.cz>
- Cc: public-html@w3.org
On Mar 31, 2007, at 22:44, Jirka Kosek wrote: > Henri Sivonen wrote: > I was thinking about optional version information, not mandatory. OK. > I think that you are anticipating that in the future there will be > only > one version of HTML (HTML5) which will be then extended and became > HTML6, etc. This WG can have an effect on the future on that point. The future doesn't just happen. The versions you cite below did not just emerge. They were deliberately specced by the XHTML 2.0 WG (and in the case of MP, a group roughly subscribing to similar philosophy, apparently). > But even now there are several versions of HTML in use: > Strict, Transitional, Frameset, 1.1, Print, Basic, MP. Future may > prove > that this was big mistake and that there should be only one > version, who > knows. But we (at least me) do not have time machine. MP and Basic are clearly based on a premise that doesn't work for the Web. (If HTML can be subset because someone doesn't bother to ship a Web browser, you've got a walled garden – not the Web.) Print is not for the Web but for talking to a printer from a phone. 1.1 is roughly a superset of Strict, so as far as 1.1 vs. Strict goes, conformance checkers could well default to 1.1. (Mine doesn't, yet, because upstream wasn't ready and I was too busy with HTML5 to edit the schema from upstream. ;-) There's no reason why a single setting could not accept the features of Transitional and Frameset if versioning was absent. IIRC, James Clark did this. The main reason against is that XHTML 1.0 defines Transitional and Frameset as separate, but that's no argument for what this WG should do going forward. Strict vs. Transitional is the kind of fluffiness (saying one thing while really giving in to another) that HTML5 is supposed to avoid. (Yeah, there are the WYSIWYG editor fluffiness in HTML5, but I disagree with Hixie on that point. :-) > In my opinion conformance checker should offer at least following > validation options: > > - check document against rules for version which is specified in > document (or use the latest version if version is unspecified) What's the use case? Transitional periods e.g. the transition from the time when HTML5 is the obvious version to use to the time when HTML6 is the obvious version to use? Still, that would make it an HTML6 problem, not an HTML5 problem. > - check document against the latest version > > - check document against any arbitrary chosen version I agree. >> 5) A CMS uses an implementation-specific subset (e.g. no scripting >> and >> no forms permitted). You want to configure a general-purpose >> authoring >> tool to limit auto-completion to this subset. >> >> This use case actually has merit. However, it doesn't have merit as a >> reason for requiring all authors to include a version='5' >> incantation. > > Version information could be just optional. I'm OK with optional authoring tool configuration hooks. (The word "versioning" carries a lot of baggage.) >> Discussing this issue pretty much reduces to the discussion about the >> bogosity of xsi:schemaLocation and about the merits of a PI for >> declaring the location of a RELAX NG schema in a document instance. > > I must strongly disagree with this point. Both schema association > PI and > xsi:schemaLocation points to concrete schema written in a particular > schema language. This is very different from specifying just > version of > HTML used. The thing they do have in common, though, is that the document instance declares which rules it wants to have for itself instead of the document instance and the rules being to entirely independent inputs to the checking process. > Compare: ... > The first example specifies concrete schema written in W3C XML schema, > which is neither very flexible (you might want to use completely > different schema language), nor very clever (for various > performance and > security reasons you shouldn't fetch schemas from location provided in > document). However later example gives you much more flexibility > because > it offers one additional level of indirection. It is upon you or your > application to pick-up correct schema (or processing component, or > whatever) based on content of version attribute. Agreed. >> I think XHTML5 should neither require nor forbid PIs for configuring >> authoring tools. This is between the author and his/her editor and >> leaving the artifact in a file that gets served on the Web is mostly >> harmless. > > Editing is only one issue. But many companies use subset of XHTML as > basic format for creating universal text content. They send this > subset > between various components, for editing, I think we agree on the editing use case. > proofreading, I don't understand why version information would be needed here. Isn't this either a special case of editing or a process that only adds <ins>/<del> in a way specced in HTML5 proper? > approving, How much does approval change the document? Do you envision an approval process supporting different subsets that require the approval to be encoded in different ways? > down conversion, I don't understand why this use case requires versioning. A converter supports some part of HTML as the input. Labeling the subset used as input doesn't really help. Do you mean labeling the output from a conversion with an assertion about what subset the converter purported to produce? > publishing, ... I don't understand what you mean exactly. I assume you mean something other than publishing to browsers. However, as a general comment, a version identifier is relatively useless on input as far as consuming the input goes (for any purpose other that checking if the input adheres to its self-claimed version). It is useful for constraining *output*. But since you put the information on *input*, it makes sense for constraining tools capable of *round-tripping*, such as editors. > They need to carry information about version > of subset used and without some HTML provision for version labeling > they > have to extend HTML with their own element/attribute which of course > causes problems in many tools that are recognizing only HTML markup. This is the main reason why I suggested having an optional attribute with user-defined contents instead of suggesting that those who need the attribute, extend HTML with such an attribute themselves. >> I am less sympathetic to an attribute on the root element for the >> same >> purpose, but I'd be willing to concede to an optional attribute with >> user-defined contents for the purpose of use as a hook in private >> authoring workflows. E.g. profile='acme-cms-scriptless-and-formless'. > > Such attribute should be allowed not only on root (html) but on any > element which can act as a root of reasonable HTML fragment. Slippery slope already. :-) > Ideally > this should include all HTML elements, but having it at least on block > elements is a must for CMS that compose final documents from small > pieces. If a CMS does document assembly like that, surely it can filter out the version identifiers from fragments while it is at it. >> With the schema project for (X)HTML5, fantasai and I have built in >> some >> options in the schema for dealing with HTML5 vs. XHTML5 >> differences and >> for catering to subsetting in ways that we foresee as reasonable. > > Could you please put pointer to this schema here? I would like to see > how extensibility is handled. http://syntax.whattf.org/ Please ignore the sentence about XSD, DTD and SGML. I expect to refactor the rest of the exclusions related to <header>, <footer> and sectioning elements from RELAX NG to Schematron soonish. If fantasai is OK with it, I'd like to move the exclusions related to interactive elements to Schematron as well. Extensibility (supersetting) is not handled in any particular way. Subsetting is. However, if your extensions just add stuff to the common content models, you are good to go for supersetting as well. >> Since subsetters are going to do their own thing anyway, naming the >> subsets should be user-defined and it would be pointless to try to >> come >> up with a closed list of de jure subset names. > > But there could/should be some basic naming policy. Remember that > people > need some guidance (at least majority of people ;-). One option would be having a wiki-like registry with a low barrier of entry for documenting existing subsets so that others can use the same name for talking about the same subset. (By "low barrier" I mean *way* lower than registering anything with the IANA.) HTML5 has a trial balloon in the spec about solving rel value extensibility like this. Hopefully subsets won't be identified by HTTP URIs. Otherwise after a while someone suggests that the URI should dereference into a schema. >> Indeed. Online conformance checkers should probably default to the >> broadest feature set they support. For example, allowing embedded SVG >> and MathML by default. (The reason why mine doesn't, yet, is that I >> haven't had time to review the SVG and MathML stuff properly, yet.) > > With NVDL you don't have to review and study schemas for new > languages. ;-D > You just say that this namespace is handled by this separate > schema, no > need to integrate schemas using their extensibility hooks. NVDL wouldn't remove the need to do quality assurance on schemata obtained from elsewhere. See what happens when I just deploy stuff from upstream: http://golem.ph.utexas.edu/~distler/blog/archives/001206.html#c008677 ;-) >>> This for example means that you can not embeded XHTML page into SOAP >>> message and identify version of XHTML used. >> >> Considering what I said above, versioning XHTML inside SOAP messages >> should not be necessary. Interchange with loosely affiliated or >> unaffiliated parties is similar to the browser use case. > > In practice many companies use web-services in pretty tightly coupled > setups with very strange requirements. You say that specifying HTML > version should not be necessary. I say I have seen real > requirements for > using controlled subset of XHTML inside payload. If the setup is tightly coupled, why isn't the subsetting part of the tight coupling without an explicit flag? >>> Example 6. More robust way of labeling document as XHTML Print >> >> FWIW, I think XHTML Print has remarkably little relevance to Web >> content >> or even authoring in editors. > > Why do you think so? According to the abstract of the spec, XHTML Print is designed to be a language between a mobile phone and a printer. The way I understood this was that a program on a phone generates an XHTML Print document and sends it to a nearby printer without the document ever being served on the Web or being edited with an authoring tool. (I admit, though, that I don't really understand why XHTML Print exists. I find the premise of the spec bizarre. I don't understand why a mobile device couldn't spool as PostScript or PDF and why a printer vendor would ever want to embed an XHTML+CSS engine in a printer instead of making the printer consume a format that encodes final-form geometry either as vector graphics or as straight raster data.) > In my personal opinion if you want to ensure that HTML will not fork, > you have to provide complete and flexible enough language that will > allow subsetting for more restricted environments like mail, low-cost > printers, Joe's 10 tag super-simple HTML, ... > > Without providing such facility respective subsets will fork and this > will lead to great confuse for developers, and to incompatibilities > for > content producers. The way I see it is that the primary facility is writing a delta spec in English. If there's software that has to support multiple subsets and distinguish them, I'd be OK with an optional configuration hook as a PI before root or as an attribute on the root element. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Sunday, 1 April 2007 21:22:49 UTC