Re: XHTML Invalidity / WML2 / New XHTML 1.1 Attribute from Dan Connolly on 2000-08-13 (www-html@w3.org from August 2000)

From: Dan Connolly <connolly@w3.org>
Date: Sat, 12 Aug 2000 22:11:05 -0500
To: Sean Palmer <sean_b_palmer@yahoo.com>, robin@isogen.com
CC: www-html@w3.org
Message-ID: <399611C9.4865C682@w3.org>
[Robin, pls consider using namespaces/schemas
for your "experimental markup". See below...]

Sean Palmer wrote:
> 
> I'll start with the end of Mr. Connolly's message:-
> > I don't think we need anything new;
> > XML, namespaces, RDF, and
> > XML Schemas should do nicely.
> That's one of the most potent pieces of writng
> anywhere on the Web in my opinion.
> It raises more questions than it answers, though.
> 
> 1) Isn't this leading to over complication of he
> Internet? Surely the move towards valid XML is hard
> enough for most amateurs, but this might put a lot of
> people off. Soon it may become very hard to write
> decent valid Web documents. Is all of this really
> needed?

Yes and no; yes, I think all of this XML/RDF/schema
complexity is needed to keep the whole
marketplace of technologies stable -- to facilitate
innovation without putting stable technologies at
risk.

But no, I don't think it's an undue burden
on most web users and authors; I don't
think "the beginner's guide to HTML"
needs to get any bigger. Here's the HTML 4.0 hello
world document:

[[[
Here's an example of a simple HTML document:

         <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
            "http://www.w3.org/TR/html4/strict.dtd">
         <HTML>
            <HEAD>
               <TITLE>My first HTML document</TITLE>
            </HEAD>
            <BODY>
               <P>Hello world!
            </BODY>
         </HTML>
]]]

And here's the first example from the XHTML 1.0 spec:

[[[
Here is an example of a minimal XHTML document.

           <?xml version="1.0" encoding="UTF-8"?>
           <!DOCTYPE html 
                PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
               "DTD/xhtml1-strict.dtd">
           <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en">
             <head>
               <title>Virtual Library</title>
             </head>
             <body>
               <p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>
             </body>
           </html>
]]]

I think the differences are entirely manageable; that is:
if you're new to HTML, either of them is pretty much
equally complex. If you're familiar with HTML 4,
changing <P> to <p> and adding </p> is not going
to blow your mind.

If/when folks get comfortable with schemas and
namespaces, we can drop the DTD gobbledygook at the top:

           <html xmlns="http://www.w3.org/1999/xhtml">
             <head>
               <title>Virtual Library</title>
             </head>
             <body>
               <p>Moved to <a href="http://vlib.org/">vlib.org</a>.</p>
             </body>
           </html>

That's pretty darn close to what people are currently
taught as an intro to HTML:

    <html>
    <head>
    <TITLE>A Simple HTML Example</TITLE>
    </head>
    <body>
    <H1>HTML is Easy To Learn</H1>
    <P>Welcome to the world of HTML.
    This is the first paragraph. While short it is  
    still a paragraph!</P>
    <P>And this is the second paragraph.</P>
    </body>
    </html>

	-- http://www.ncsa.uiuc.edu//General/Internet/WWW/HTMLPrimerP1.html

I'm asking for one lousy attribute,
	xmlns="http://www.w3.org/1999/xhtml"
at the top (plus consistent lower-case spelling of tag names),
and some faith that XML Schemas are going to become
mature for validation purposes.

That's for the stable, standardized part of the language.

Now let's look at extensions...

Consider the beginner, somewhat like yourself, who
asks, "can I have just one extra attribute over here?"
The HTML 4.0 answer was: no, not unless you're
willing to join The HTML Gods and fly to all the WG
meetings, or convince somebody else to do it for you.
Or unless you can implement a new browser and somehow
get it deployed to a non-trivial user base.

"Gee... I just wanted to stick a little extra data
in my recipie files in HTML is all... I guess I'll
use comments or some such hack" is the what that
guy will probably say. Or, more likely, he'd stick
the extra attribute in there anyway, validator.w3.org
be damned.

For example, Robin Cover has been doing this for years:

	An experimental approach is being used in
	markup - exploiting the behavior of HTML
	browsers whereby unrecognized tags are simply ignored.
	If the non-HTML tags are causing problems in your
	browser, please let me know. The behavior of
	user-agents in this respect has been authorized
	in several HTML specifications. 

	-- http://www.oasis-open.org/cover/caveats.html

Note that he doesn't claim the document conform -- they don't.
He just claims that implementations are allowed to support
them. Of course, implementations are allowed to do anything
with non-conforming documents.

OK... so the popular HTML implementations deal with these
extra attributes just fine. But anybody who depends
on them is taking a bit of a risk... W3C doesn't
actually Recommend that you write such documents.
Not yet...

But we've got a couple draft for ways to do it:
the HTML modularization spec shows you how to add
your own module and mix it in with the standard
modules. I don't care for that approach, because
it's limited in all the ways that linking two C
modules are limited: one big unmanaged centralized
namespace, no "first class" modules recognized
by the compiler (the validator). But it has the
virtue that existing validating XML processors
can be used for validation.

The other way is XML Schemas. I demonstrated how
this works: just write a little schema, stick
it in the web, point to it from your document,
and off you go.
http://www.w3.org/XML/2000/04schema-hacking/comment-test.html

Robin, how about XHTML-izing your pages, writing
a little schema for your extensions, declaring
your extensions with namespaces, and pointing
to the schema from your namespace declaration?

The software for validating the results is
less mature, but I think it has a brighter future.

So I think XHTML, XML, namespaces, and schemas
are a good mix... they make the easy things
easy and the hard things possible.

Yes, there's a lot of complexity under the covers.
But the causual user need not understand it
any more than they need to understand congestion
control in TCP/IP to surf the web.




> 2) How do you suggest that we employ RDF into XHTML
> 1.0 and still have it validate as a document?

Er.. you wouldn't be asking that if you handn't
seen my suggestion for how to do exactly that
in my message of
Sat, 12 Aug 2000 11:36:25 -0500

As to validating RDF, I'm working out the details
of an XML Schema for RDF:

http://www.w3.org/2000/07/rdf.xsd

There's also the DTD approach, though it's more awkward:
	Valid RDF: Using Namespaces with DTDs 
	http://www.w3.org/XML/9710rdf-dtd/

> What is
> the point of having RDF/Schemas to add one attribute?

I didn't need RDF for the one attribute. Yes,
I needed a schema
to document the intended usage and meaning of that
one attribute, and to facilitate validation of
that attribute when mixed in with other documents.

I'm suggesting that folks who *design* new attributes
write schemas, not folks that just use the
new attributes.

Is it really too much to ask? I mean, if you take
the documentation/copyright fluff out of the
schema I built for comments, it's 4 lines:

<schema xmlns='http://www.w3.org/1999/XMLSchema'
  targetNamespace='http://www.w3.org/2000/08/comment#'>
<attribute name="comment"/>
</schema>

For 4 lousy lines of code served up in a file
on the web, you get your new attribute, complete
with (a) assurance that nobody else's comment
attribute will be mistaken for yours, and
(b) (emerging) validation support.

I used RDF because it's well suited to
the title/author/description sort of semantics
that (I forget who) was talking about.

But RDF is really a whole other story,
and I'm not going to go into it in depth just now.



> 3) Choose for me: XHTML validity, or neat programming!
> I know that is a bit of a hyperbole,but it kind of
> gets my point across.

Not at all. I think that schema-valid XHTML
is quite neat and clean.

> However, your statemeant is memorable because of what
> it means: we should utilise current languages before
> moving on to anything new. The only problem I see is
> that nothing gels together properly.

There is enough cruft that the connections get
obscured sometimes, but I think it all fits
together quite nicely. I hope you'll agree,
eventually.

> You have to in
> effect, choose which language you want to use.
> Otherwise, it just won't validate.

I don't think I understand what you're saying.

If you're saying that some languages can't
be mixed with others, that'll be true in
the general case. But for the languages W3C
is developing, and a large part of the rest
of the marketplace of markup languages,
I don't think it is.

If you look at the stuff under
http://www.w3.org/XML/2000/04schema-hacking/
you'll see that I'm able to validate
all sorts of combinations: HTML with SMIL,
HTML with SVG, HTML with a new comment thingy,
HTML with MathML, etc. And all the other combinations
are just a matter of time to work out the details
of the example, not new technology.

> I'm very glad that you produced the comment schema for
> us, and I'm glad it has a permanent W3C URI. I will be
> using it in a lot of my future documents, because I
> author quite a lot, and, for reasons discussed, it is
> worth me including a comment attribute (even if it in
> the form of util:comment).
> > > or should I copy it
> > > locally?
> > You may... lemme make that explicit...
> Sorry, that's my odd way of asking permission! I
> didn't mean to be impertinent.
> 
> Maye in time it will become accepted, and even
> included in most XHTML documents (pff...that's a bit
> much to hope for). Spread the word everybody!
> 
> Why does the validator not recognize line-breaks?

validator.w3.org is based on DTD-style validation.
It has sort of experimental support for well-formed
XHTML documents with namespaces. It doesn't grok
XML Schemas yet. To check for XML schema validity,
you need to use XSV, which is in alpha state:
	http://www.w3.org/2000/06/webdata/xsv

> I'll
> post that on the validaot list...

Please do.

> > You'll have to do more work
> > if you want to use namespaces
> > with DTDs. The XHTML modularization
> > spec supposedly tells you how to
> > do it, but it's pretty tedious. I
> > recommend you don't bother
> > with DTDs, if you're interested
> > in mixing vocabularies.
> I never thought I'd see a W3C member advise somebody
> not to use DTD's. Even in this special case!

My experience leads me to believe
that parts of XML are solid architectureal infrastructure
for the long term: tags and attributes, and namespaces.
But other parts of it are there to manage the
transition from the existing software base: DTDs,
entities, processing instructions, and I don't
recommend investing them unless you are
constrained by existing software somehow.

Note that on the same day we Recommended XML,
we released a Note that paints the way
forward for an extensible, self-describing
web of languages:

	http://www.w3.org/TR/1998/REC-xml-19980210

	http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210

> I did do a little more owrk...
> http://xhtml.waptechinfo.com/xhtmlct.html
> and http://xhtml.waptechinfo.com/xhtmlct2.html
> But neither document validates (and I don't expect
> them to.)
> I really would like to make a document work with a DTD
> and a Schema. I looked at the modularization document
> again (I have done many times before), and it just
> increased my belief that the technical specialists are
> losing their "comon touch". (present company
> excepted).

I don't see any excuse for myself or any of the
present company ;-) Many of our specs have
a lot of gobbledygook in them. I'm struggling
to understand the XHTML modularization spec,
and some of the folks in the HTML WG are
struggling to understand the XML Schema spec.

But keep in mind that this stuff is just
starting to mature... major desing decisions were
being made just a few months ago, and
it takes time to delelop good end-user
docuemntation and tutorial materials.

Though the XML Schema spec is pretty complex,
I think the good news is that the schemas
themselves are not bad. Witness the 4 liner
above. Heck, it's short enough that
I'll repeat it:

<schema xmlns='http://www.w3.org/1999/XMLSchema'
  targetNamespace='http://www.w3.org/2000/08/comment#'>
<attribute name="comment"/>
</schema>

I hope somebody will step in and show us
the corresponding XHTML DTD module, but
I suspect it will expose quite a bit
of gobbledygook in the module itself.

> I think somebody should write a laymens
> guide to the modularisation of XHTML, just what it
> involves, how to do it, and, most importantly, why
> bother.

Yup... that's an important task.

> Maybe I'll have to do it myself,

I encourage you to give it a try!

> but I'd
> rather see it issued as a note at W3C.

It's entirely possible that if you write
up such a thing, W3C will publish it as
a note.

>?XHTML 1.1 is
> going to be absolutely incredible, but everybody is
> going to have a lot to learn to get the most out of
> it.
> Soon, we are going to have documents incorporating:-
> XHTML .1, CSS, XML, XSLT, Schemas, RDF, MathML (etc.),
> and who knows what? We need somebody to guide the way.

Yes, but keep in mind that W3C is not here to put
all the trainers, book writers, conference organizers
and such out of business. If other folks can
do a good job of writing tutorial materials, we
try to let them. But sometimes, it's necessary
that W3C resources are deployed to do it. More and
more, as our technologies are more varied and
more complex. XML Schemas are case in point:
	XML Schema Part 0: Primer
	http://www.w3.org/TR/xmlschema-0/


> In reference to your <div style="display:
> none"></div>, I think that the backwards comatability
> issues this raises are too great. Do you have any
> other ideas.

Yes, I gave one: restrict yourself to attributes.


> BTW; one problm with schemas. Soon there is going to
> be a lot of Schemas floating about on the Web. I think
> that the W3C should issue some kind of standard for
> authors, so that they may submit them to some group
> (?????) especially devoted to this problem.

Ahh... the urge to centralize. We try to avoid
that at W3C. Consider:

	Soon, there is going to be a lot of HTML
	documents floating about on the Web.
	I think that the W3C should issue some kind
	of standard for authors, so that they
	may submit them to some group especially
	devoted to this problem.

Or
	Soon, there is going to be a lot of images
	floating about on the Web ...

Review and endorsement is a great thing, but W3C
can and should do only so much of it. I expect peer
reviewed journals of all sizes and shapes
to organize around good/bad schemas and schema
techniques and practices.
Witness
	"The XML.ORG Registry offers a central
	clearinghouse for developers and standards
	bodies to publicly submit, publish and
	exchange XML schemas, vocabularies and
	related documents."
	-- http://xml.org/registry/about.shtml

I don't consider that the perfect model: I don't
think "central" is a feature, and I'm not sure
what value they're adding... there's no peer
review, so they'll endorse/register anything,
so searching with google is as likely to
give you good stuff as searching their repository.
Geocities.com will register your schema too,
if all you want to do is put it in the web ;-)

But maybe peer review will happen naturally...
that is: maybe people just won't try to
register stuff at xml.org that they haven't
worked hard to get right.

Ah... they do add some value, in that
everything in their respository is
licensed for redistribution:
	http://registry.xml.org/RightSite/legal.htm


> At the
> moment, the list of Sche,mas is small(ish), but just
> you wait!!! I'm glad that our comment one has a W3C
> URI...I think Schemas are a kind of DTD for the common
> man (or woman), and that they need some type of proper
> regulation.

I'm not a fan of regulation, myself. I hope you
don't think that just because a document's
address starts with
	http://www.w3.org/
it's somehow magically good.


> Anyway, back to comment(s). I might write a small
> script to extrat util:comment attributes from
> documents. Is anyone interested???
> 
> I think all of this raises another important point: is
> XHTML 1.0 really XML.

Huh? On the contrary! The whole point of XHTML,
and my comment example, is that XHTML *is* XML,
and you get the benefit of all the corresponding
tool support. For example, here's an XSLT script
to extract util:comments:

<!-- $Id: comment-extract.xsl,v 1.1 2000/08/13 02:47:26 connolly Exp $
-->
<xsl:transform 
    xmlns:xsl  ="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:util ="http://www.w3.org/XML/2000/04schema-hacking/comment#"
    >

<xsl:template match="*[@util:comment]">
  <xsl:value-of select="@util:comment"/>
</xsl:template>

<!-- don't pass text thru -->
<xsl:template match="text()|@*">
</xsl:template>

</xsl:transform>


> I think this little exercise
> proves that it isn't.

Huh? I don't follow at all. XHTML cerainly is XML.
Which part of our conversation suggests it's not?

> It also proves it isn't HTML.

I disagree, though this issue is not as black-and-white.

But all the HTML-consuming tools I use grok
the comment example just fine.

> -
> this all supports the text/xhtml MIME type suggestion.

No, I don't see how it does.


> We are not going to see a comment attribute appear in
> the XHTML 1.1 specification. That is very clear.
> However, if I can't get that, I am at least going to
> try to make Schemas and RDF more accessible to
> programmers.

I salute you!


> At the moment, I think that the util:comment is the
> best way forward.I challenge people to think up a
> better idea, and prove it works.
> 
> My last pouint (for today) is IF XHTML is XML, and
> Schemas are valid, then surely we should be able to
> display some nice little logo on our pages. I think
> I'll copy the Valid XTML logo and cross out the HT.
> This probably infringes so much Copyright it's unreal,
> but I'm sure everyone is up for a laugh.
> Have a look at http://xhtml.waptechinfo.com/logo.gif

Yes, it's fun, but yes also, W3C is a trademark
of ours, and I might be obliged to ask you
to Cease and Desist from making that logo
available on the web... Hm... no, taking a close
look at our license, it says

"W3C Trademarks must only be used in a way that
accurately reflects the STATUS
associated with the W3C products."

http://www.w3.org/Consortium/Legal/trademark-license-19990516

and given that modularization and schemas are
still not 100% cooked, I think your logo
"accurately reflects the STATUS associated with
the W3C products."

> Kindest Regards,
> Sean B. Palmer
> WAP Tech Info - http://www.waptechinfo.com/
> 

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Saturday, 12 August 2000 23:12:02 UTC