Re: Request for review: CC/PP specifications

> I am intimately aware of tidy's limitations, hence my statement "We have
> done extensive work on the tidy package".  Specifically, we have ported
> it from C to C++ and worked on it so that it now successfully converts
> 99% of web pages -- we do nightly runs on random URLs and have now
> tested over 100,000 websites (over 1 Gig of HTML).  Our version (xtidy,
> as we've named it), can handle the most radical documents you can think
> of.

> > It is impossible to do it perfectly every time because the majority of
> > people do not code HTML perfectly. XHTML by it's very nature should be
valid
> > XML. HTML might work 99% of the time, but SGML is a funny thing. The
spec.
> > for it is rather long...
> As mentioned above, xtidy produces well-formed XML and valid XHTML 100%
> of the time.  End of story.  That is the whole point of the program.
You said 99% of the time - the same as myself!

> > > I am a bit confused by your use of the word "proprietary".
> > > Any HTML document has an implicit XHTML version that will render the
> > > identical output.
> > Yes, any well coded HTML document. But, for reasons discussed above, it
wont
> > work all of the time.
> I will clarify my above statement.  Any HTML, once rendered by a
> browser, is in a form that has a corresponding XHTML version that would
> produce the IDENTICAL rendered output.  I will concede that the way
> Netscape renders a document and the way Explorer renders a document are
> different, but that is the case even for valid XHTML documents.
No, there must be some examples of invalid code that do not sucessfully
convert to XHTML.

> > Perl isn't as useful as XSLT in my opinion. I could sketch out some
> > arguements for this, but i'll leave that for a later date.
> > XML Script, hmmm...all non-W3C recommendations. As this is a W3C
> > recommendation we are reviewing, I thought it might be better to debate
with
> > W3C languages so that any results may be published in the next CC/PP
draft.
> I do not have much experience with perl for XML, and I know that XSLT is
> fairly powerful, but I'm sure the readers of this list (many of whom
> probably have never used either perl or XSLT) would benefit from your
> comparison and critique of the two conversion technologies.
Would they? If so I'll write a nice little document about it then.

> Yes, XML Script is a non W3C standard.  I would very much like to point
> out that this is completely irrelevant.  WAP and WML are both non W3C as
> well, and yet they are intimately tied to this debate.  The W3C does not
> (thankfully) operate in a vacuum, assuming only it's own endorsed
> standards and technologies are permitted or acceptable on the web.  XML
> Script is a powerful enabling technology for the web and the growing
> amount of XML content and therefore should be given due consideration.
The W3C is working hand in hand with WAP Forum on WAP and WML/WML2.
No comment about XML Script.

> > True; XSLT falls down in many areas. However, at no point have I said it
is
> > perfect, it is merely the best we have to go on at the moment.
> Again, in the W3C vacuum, XSLT is all that exists, however in the W3C
> vacuum there is also no such thing as WML, CHTML, WAP, or many other
> fundamental web technologies.  There are better alternatives, they are
> outside of the W3C, and that is OK.  You are allowed to look elsewhere
> for appropriate and relevant technology.
 See above! Plus CHTML was issued as a note by the W3C. The W3C always keep
on top of the situation!

> > I'm not up on XML Script, but i'll have a look at it and see what I
find.
> > Can it be parsed server side?
> Yes, it can sit server or client side, and is supported under LINUX,
> Solaris, and Windows.
Sounds pretty good.


> My response is that the reality of the web is hand coding.  People use
> tools to do framework, ASP and other DB centred automated generation
> techniques, but the reality is the majority of web pages (whether
> statically or dynamically generated) contain hand coding.  This implies
> that HTML (even if it was SUPPOSED to be XHTML) is going to exist for a
> long time to come, and any relevant recomendation concerning web content
> must take this reality into account.  Refering to HTML is relevant.
> Refering to XHTML is not.  I wouldn't be surprised if 50% of the worlds
> XHTML pages are held within the W3C web-site (i.e. not many people are
> producing XHTML), therefore references to it, for the moment, are not
> relevant.

Actually, I think that there are a lot of sites authored in XHTML, and there
are even lists of examples. Almost all of my sites use it, Encyclozine, all
of the XHTML related sites, WDVL, etc.
XHTML is very very advantageous to HTML and should be recommended at every
turn. e.g. http://xhtml.waptechinfo.com/extxhtml/


> Exactly which CC/PP document are you refering to that has frequent
> references to WML?  The 3 I have just checked refer to it exactly 1, 3,
> and 7 times in the document bodies.  Again I will stress that HTML is
> the reality, XHTML is the ideal.  Again, even if everyone was
> _attempting_ to produce XHTML, the degree of hand-coding which exists in
> current web-content is such that it would frequently NOT be well-formed,
> and therefore would much more likely be HTML.  In fact, most web-content
> isn't even valid HTML, but the commercial requirement for browsers to
> support "almost"-HTML is such that they do -- it is unbelievable what
> sort of horrendous "almost"-HTML browsers such as Netscape and Explorer
> will quite happily render into attractive web pages.

Yes, this is something we just have to accept. MS and Netscape couldn't make
the browsers too strict because that would cut down on the number of pages
that they could view, plus they have to be backwards compatable.
People will make the effort to render correct XHTML because it is then valid
XML and you can use any of the various XML tools around to process it. For
example, once again see http://xhtml.waptechinfo.com/extxhtml/
I wholeheartedly support XHTML as the future of the Internet, in 5 years
from now, most of the Web's up-to-date architecture will be in some form of
XML, including XHTML.
For these reasons, I still *strongly recommend* that the line in question be
changed to include the word XHTML.

Kindest Regards,
Sean B. Palmer
WAP Tech Info - http://www.waptechinfo.com/

Received on Friday, 1 September 2000 06:30:53 UTC