Re: XHTML Invalidity / WML2 / New XHTML 1.1 Attribute from Sean B. Palmer on 2000-12-12 (www-html@w3.org from December 2000)

From: Sean B. Palmer <sean@mysterylights.com>
Date: Tue, 12 Dec 2000 11:49:28 -0000
To: <www-html@w3.org>
Cc: <www-talk@w3.org>
Message-ID: <00bf01c06431$9824d660$12ed93c3@z5n9x1>
[Note: This is quite an essay of a reply: and very multi-topical.]

> If/when folks get comfortable with schemas and
> namespaces, we can drop the DTD gobbledygook at the top:
>
>            <html xmlns="http://www.w3.org/1999/xhtml">
>              [...]

Hopefully we will be able to say "when", but we all know that DTDs aren't
going to just "vanish". They are highly important for defining document
formats, just as Schemas are for defining data; they have different roles to
play. What we (in fact the HTML WG) have to decide is whether (X)HTML is a
document or data format, and they have to do it for XHTML 2.0.
I suppose using namespaces to point to Schemas would reduce the prologue,
and the file size and editing-by-hand time, but I can't help remembering who
*suggested* we put that "DTD gobbledygook" at the top in the first place(!):

[[[
So if we're commited to SGML, let's start putting something like
<!DOCTYPE HTML SYSTEM
     "http://info.cern.ch/hypertext/WWW/MarkUp/html.dtd>
at the front of every HTML file
]]] - Dan Connolly, 14 July 1992,
http://lists.w3.org/Archives/Public/www-talk/1992JulAug/0016.html

Also:-

[[[
(we don't have to store it in the mfile -- servers that distribute
HTML could generate it on the fly.)
]]] - ibid.

Indicates that you *knew* people wouldn't bother adding a !DOCTYPE
declaration to the top of their files.
8 years down the road, and 99.99% of all Web documents aren't valid SGML or
XML. I blame Mosaic personally; although it did wonders in promoting HTML
and the WWW because it allowed almost any crud code, it has almost destroyed
it for the same reason. Winston Churchill said, "Success is not the end of a
story", and just because the WWW is going O.K. now, I don't think the
architectural principles are strong enough for it to retain its integrity
for much longer. They may have been in the first couple of years, but the
only trace of them now is a web site named http://www.w3.org/

[Note that the original DTD for HTML by Mr. Connolly was in
http://lists.w3.org/Archives/Public/www-talk/1992MayJun/0020.html]

> I'm asking for one lousy attribute,
> xmlns="http://www.w3.org/1999/xhtml"

I don't believe it will *be* just one lousy attribute, because validating by
namespace is only a very small part of the list of options in the XML Schema
specification. I agree that it may be the best way, but it's not the only
way. You will have to allow for xsi:SchemaLocation, which will bump up the
prologue aspect of validation again. You can't use "the file gets smaller
and simpler" as an excuse to use XML Schemas for XHTML: instead use the fact
that they have certain benefits over XML DTDs.

> the HTML modularization spec shows you how to add
> your own module and mix it in with the standard
> modules. I don't care for that approach, because
> it's limited in all the ways that linking two C
> modules are limited: one big unmanaged centralized
> namespace, no "first class" modules recognized
> by the compiler (the validator). But it has the
> virtue that existing validating XML processors
> can be used for validation.

I don't get your qualms there, "one big unmanaged centralized namespace"? If
you mean http://www.w3.org/1999/xhtml then that's probably the same NS as a
Schema version would use (except they'll probably change it before then for
2.0...well they should do if the modules are going to change).

> The other way is XML Schemas. I demonstrated how
> this works: just write a little schema, stick
> it in the web, point to it from your document,
> and off you go.
> http://www.w3.org/XML/2000/04schema-hacking/comment-test.html

It's a good approach. That way, instead of just declaring the namespaces in
a DTD and not using them, you don't have a DTD at all and simply use the
namespace to point to your Schema.

> So I think XHTML, XML, namespaces, and schemas are
> a good mix... they make the easy things easy and the hard
> things possible.

I agree. Not to mention the upcoming RDF...

> > 2) How do you suggest that we employ RDF into XHTML
> > 1.0 and still have it validate as a document?
>
> Er.. you wouldn't be asking that if you handn't
> seen my suggestion for how to do exactly that
> in my message of
> Sat, 12 Aug 2000 11:36:25 -0500

I know how to insert it into a document, I meant how do we have it validate?
We use RDF Schemas for RDF, which *don't* offer any sort of structural
scheme for the attributes and elements used in RDF. That is quite a
nightmare, and it means we can just mix RDF namespaces freely (within the
bounds of the RDF M & S recommendation).

> As to validating RDF, I'm working out the details
> of an XML Schema for RDF: [...]
> http://www.w3.org/2000/07/rdf.xsd [...]
> There's also the DTD approach, though it's more awkward:
> Valid RDF: Using Namespaces with DTDs
> http://www.w3.org/XML/9710rdf-dtd/

I wrote an XHTML module for Dublin Core:
http://xhtml.waptechinfo.com/modules/rdf/rdf.mod
So it can even be done in XHTML m12n. Maybe if we ever get a Schema version
of XHTML, I'll attempt to write a Schema for it too...

> For 4 lousy lines of code served up in a file
> on the web, you get your new attribute, complete
> with (a) assurance that nobody else's comment
> attribute will be mistaken for yours, and
> (b) (emerging) validation support.

I like the way you used it as a cover for the actual target namespace
(http://www.w3.org/2000/08/comment# ) in case the XML Schema specification
changes (which it did). It means I'll have to write a new Schema though....

> But RDF is really a whole other story,
> and I'm not going to go into it in depth just now.

Ugh...if you had have done, it would have saved me a lot of time and effort!
RDF seems to be the way the Web is gravitating, but a lot of people don't
like the rigidity:-

[[[
[...] People who put up web sites got an immediate benefit: they had access
to their data from anywhere and could easily share that data with anyone wit
h a web browser.
Unfortunately, the new work on technologies such as RDF and XML do not have
this benefit. I see no benefit from providing my information in these
formats -- it just means more work for me. No web browser can view these
formats, and it seems as if they never will. Unless, of course, I write more
files: XSLT transformations to display it properly and DHTML code to modify
it, the list goes on.
Needless to say, this isn't going to have very many people jumping to use
these new formats.
]]] - Aaron Swartz, Blogspace

But then Aaron, in his attempts to overcome this problem, is having to use a
proprietary templating system as an interface and storage medium for a
database. The motto seems to be "if you can do it yourself, then do it using
the easiest means possible". I think if people pulled together and actually
created a Semantic Web by bending some of the W3C specifications (XHTML,
XML, RDF etc.) together, then it would work. But not many people are
inclined to try, even if a large handful *do* see sense...

Needless to say, when Aaron releases the source for his (in my opinion
incredible) new invention, I'll be converting it so that the document
template medium is in XHTML/RDF...or at least I will do if I can work out a
method of doing so based on my [1] ADotSW work.

> > You have to in effect, choose which language you
> > want to use. Otherwise, it just won't validate.
>
> I don't think I understand what you're saying.

Try mixing DTDs with Schemas, it's very hard to do, but not impossible.

<a id="danc">
> My experience leads me to believe
> that parts of XML are solid architectureal infrastructure
> for the long term: tags and attributes, and namespaces.
> But other parts of it are there to manage the
> transition from the existing software base: DTDs,
> entities, processing instructions, and I don't
> recommend investing them unless you are
> constrained by existing software somehow.
</a>

I agree. In summary:-

<rdf:RDF xmlns="http://www.w3.org/1999/02/22-rdf-syntax#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax#"
     xmlns:logic="http://www.w3.org/DesignIssues/Toolbox.html"
     xmlns:dc="http://purl.org/DC/elements/1.1/" >
<rdf:description about="#danc"
     dc:title="Solid Architectureal Infrastructure">
<rdf:bag>
     <rdf:value>Tags</rdf:value>
     <rdf:value>Attributes</rdf:value>
     <rdf:value>Namespace</rdf:value>
  <logic:not>
     <rdf:value>DTDs</rdf:value>
     <rdf:value>Entities</rdf:value>
     <rdf:value>Processing Instructions</rdf:value>
  </logic:not>
</rdf:bag>
</rdf:description>
</rdf:RDF>

I think it is possible to build up a database fully in RDF, possibly
utilising XHTML and ADotSW [1]. However, I am always reminded of the
exchange
between myself and Mr. Bingham in an ERT conference:-
[[[
HB: [...] [in SGML] we defined that element names can be as
     expressive as you like.
SP: But attaching behaviours?
HB: No!
]]]  20 November 2000 W3C WAI ERT Telecon

Which sums up a very important architectural fact: attaching behaviours to
markup sucks. Everyone knew that in the early days of the Web, but then
Mosaic came along...and although Mosaic catapulted the Web into (in)famy, it
did so in a way that meant "within itself it contained the seeds of its own
destruction". Now those seeds are starting to germinate, and if you'll
excuse the furthering/killing of the analogy, we need some weedkiller fast.

[Strangely enough, XHTML still has a lot of behaviours left ruminating
within, so maybe the aim for 2.0 is to provide a version that could be
understood in "processors" that have no foreknowledge of XHTML and it's
behaviours. RDF anyone?]

> Note that on the same day we Recommended XML,
> we released a Note that paints the way
> forward for an extensible, self-describing
> web of languages:
> http://www.w3.org/TR/1998/REC-xml-19980210
> http://www.w3.org/TR/1998/NOTE-webarch-extlang-19980210

This is the kind of strong Web architecture that I was talking about.
Partial credit though, for not outlining many implementation methods. In
fact, the document is a bit of a bore after a couple of reads "been there
done that". Some of the other stuff in the Design Issues
(http://www.w3.org/DesignIssues/ ) series is a lot more interesting.
People are still only just coming to grips with the WWW and what it means,
even though it is ten years old now.

> I don't see any excuse for myself or any of the
> present company ;-) Many of our specs have
> a lot of gobbledygook in them. I'm struggling
> to understand the XHTML modularization spec,
> and some of the folks in the HTML WG are
> struggling to understand the XML Schema spec.

That's an all important quote, and a bit nauseating. Anyway, it is never the
experts that get the final say, only the huddled masses (i.e. users), and so
all of the conversations we are having are irrelevant and useless even if
they do map out a solid roadmap for the WWW :-)

> I hope somebody will step in and show us
> the corresponding XHTML DTD module, but
> I suspect it will expose quite a bit
> of gobbledygook in the module itself.

Certainly: it's the XHTML Common Attributes Module:-
http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-attribs-1.mod

It takes 6 DTD lines to add it into XHTML 1.1 (it would have been 4, but 2
of the lines are broken down by convention), and it could have only been
two, but it isn't because it requires lots of modularization. Also, you will
need to create a QNames file for your "util" prefix, denoted here by the
entity %util; (creative of me...)

<!-- Module for util:comment --->
<!ATTLIST %Core.extra.attrib;
     %util.comment.attrib;
>
<!ENTITY % util.comment.attrib
     "%util;comment        CDATA          #IMPLIED"
>

Which isn't all that bad, you have to admit. It adds the util:comment
attribute to any place that the core attributes would be added.

> > In reference to your <div style="display:
> > none"></div>, I think that the backwards comatability
> > issues this raises are too great. Do you have any
> > other ideas.
>
> Yes, I gave one: restrict yourself to attributes.

There are quite a few ways in which to semanticize XHTML. Try reading [#1]
http://www.mysterylights.com/sbp/adotsw/ and excuse my plug. If the HTML WG
are planning on making XHTML 2.0 a pure data format, then they should have a
look at it. In my opinion, HTML was always going to be a document format,
right from the start, but now it's time is over, and maybe it's best to just
start again, or make some radical changes.
There must be some decent method of creating a semantic web hypermedia
system that is accessible, usable, and more solid architecturally. Maybe it
would start with stuff like browser/editor combinations that allow you to
make your own personal webspaces with direct WYSIWYG editing. I dunno
really, I am not as much of a genius as most of the people I work with, or
am sending this to, so maybe someone else will come up with a better idea.

> I'm not a fan of regulation, myself. I hope you
> don't think that just because a document's
> address starts with
> http://www.w3.org/
> it's somehow magically good.

If it starts with http://www.w3.org/TR/ it should be ;-)

> > Anyway, back to comment(s). I might write a small
> > script to extrat util:comment attributes from
> > documents. Is anyone interested???
>
> [...] here's an XSLT script to extract util:comments:
> <!-- $Id:comment-extract.xsl,v1.1 2000/08/13 02:47:26 connolly Exp$ -->
> <xsl:transform
>     xmlns:xsl  ="http://www.w3.org/1999/XSL/Transform" version="1.0"
>     xmlns:util ="http://www.w3.org/XML/2000/04schema-hacking/comment#"
>     >
> <xsl:template match="*[@util:comment]">
>   <xsl:value-of select="@util:comment"/>
> </xsl:template>
> <!-- don't pass text thru -->
> <xsl:template match="text()|@*">
> </xsl:template>
> </xsl:transform>

It works on the W3C's Xalan server, but the output isn't in XML or RDF.
Maybe the following would be better:-

<xsl:transform
    xmlns:xsl  ="http://www.w3.org/1999/XSL/Transform" version="1.0"
    xmlns:util ="http://www.w3.org/XML/2000/04schema-hacking/comment#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    >
<xsl:param name="xmlfile" />
<xsl:template match="/">
<rdf:description>
   <xsl:attribute name="rdf:about">
    <xsl:value-of select="$xmlfile"/>
   </xsl:attribute>
 <rdf:bag><xsl:apply-templates/></rdf:bag>
</rdf:description>
</xsl:template>
<xsl:template match="*[@util:comment]">
<rdf:value title="util:comment value">
  <xsl:value-of select="@util:comment"/>
</rdf:value>
</xsl:template>
<!-- don't pass text thru -->
<xsl:template match="text()|@*">
</xsl:template>
</xsl:transform>

Which should give an output in RDF, and one that can be parsed in SiRPAC
(or: hopefully...as Mr. Loughborough doesn't like me saying!).

> > We are not going to see a comment attribute appear in
> > the XHTML 1.1 specification. That is very clear.
> > However, if I can't get that, I am at least going to
> > try to make Schemas and RDF more accessible to
> > programmers.
>
> I salute you!

Well, it's a few months later, and I'm still trying...
The problem is that the XML Schema specification is so intensely boring that
I can't help but fall asleep every time before I get to the end. Maybe I
should start from the end and work my way backwards ;-)
RDF is better, but it's still a bit of a pain because there aren't all that
many decent implementations of it. Dublin Core, RSS and DAML. Great...

> > At the moment, I think that the util:comment is the
> > best way forward.I challenge people to think up a
> > better idea, and prove it works.

Update: I still believe that a "util:comment" attribute would add much
needed semantics to XHTML (c.f. http://www.mysterylights.com/sbp/adotsw/ ),
but the purported extensibility of XHTML oft' referred to in Mr. Connolly's
reply opened up a world of m12n, Schemas, and RDF for me, and I haven't
looked back since.

In summary, I believe that the architectural principles described in Mr.
Connolly's reply based on his 8 years of Web experience were very sound
indeed, and that they are solid foundations with which to build a Semantic
Web on. Namely points like:-

     1) Exterminate the use of DTDs for data formats.
     2) Use RDF for metadata in XHTML.
     3) Always make sure you can validate your code.

But we have to acknowledge that the Web's architecture was always unstable,
and at present is almost shot to pieces. I think it might be a good idea to
actually try to *enjoy* the Web that we have today, whilst 1% of the pages
actually work and make sense. After all, when the day comes that we look
back on "the times when we could use one browser to view the Web, rather
than the 17 we have to use now", we will remember these times as the Web's
golden age.

> --
> Dan Connolly, W3C http://www.w3.org/People/Connolly/

Kindest Regards,
Sean B. Palmer
http://www.mysterylights.com/sbp/
http://www.w3.org/WAI/ [ERT/GL/PF]
http://infomesh.net/swdemo/#demo
"Perhaps, but let's not get bogged down in semantics."
   - Homer J. Simpson, BABF07.

P.S. I think I'll print this out and wallpaper my room with it. Sorry for
the length...but was it worth it?
P.P.S. "Please forgive the lateness of my reply".
Received on Tuesday, 12 December 2000 06:51:31 UTC