Re: [xml-dev] The problems and the future of the web and a formal internet technology proposal

Dear Marcus, here is the expected answer.

to all list-users: I am sorry for the delay, there have been many  
days where I was too weak to answer, since I am battling chronic  
fatigue syndrome, and have many days where I am non-functionnal;  
moreover, during the few days where I felt better, I had some  
pressing issues to handle until a week ago, then I felt bad from  
monday to thursday and since I felt better afterwards, I prepared the  
answers for the list on Friday, Saturday and Sunday.

 > I fully agree with your notion that there's a distinction to be made
 > between an app platform (that nobody called for, and only browser
 > vendors had a vetted interest in building) and a document format used
 > as a primary means for communication in politics, law, education,
 > medical, personal, etc, etc.
I am glad to read that I am not the only one to see that  such a  
distinction is needed

 > I don't agree with your criticism of Ian Hickson's work. AFAICS (and
 > I've probably studied HTML 5 in detail more than most people [1]) he
 > made a very good job of capturing HTML 4 rules, and added a couple of
 > not-too-controversial elements on his own.
The problems are several with HTML5. First, it starts from the  
principle that the browser developpers define the formmats, which is  
wrong. Standards should be developped by consortiums in which there  
are both specialists, developpers and people who use the  
sepcifications. Second, in this same vein, they reintroduced elements  
which had been rejected by the W3C for being the wrong solution when  
writing the HTML4 specification. The most visible case being the  
introduction of the embed element. The embed element was really a  
Netscape proprietary extension. The W3C chose the object element for  
embedded multimedia objects and rightly so. They did not do it in the  
exact same way as the Microsoft implementation (they did leave the  
classid attribute but making it optional, one can specify the  
identity of the multimedia object by putting the URI of the object as  
a value to the data attribute or as the value of the classid  
attribute, there being no need to specify a hexadecimal clsid value a  
la Microsoft).

 > Where it's gone wrong is
 > that the syntax presentation for HTML 5 (as opposed to the historical
 > HTML 4 DTD) doesn't convey its basic construction as a span-level
 > markup vocabulary extended with block-level elements. You can see  
 > with the definition of the paragraph ("p") element which includes an
 > enumeration of paragraph-terminating elements rather than  
referring to
 > the category of block-only elements. Consequently, when new elements
 > where added, the spec authors "forgot" to include elements into the
 > enumerated list of p-terminating elements, making the spec bogus. In
 > other words, the HTML 5.1 spec process lost control over their
 > workflow, and didn't want to employ SGML or other formal markup tech
 > to quality-assure their work either, which easily would have (and has
 > [1]) spotted these flaws.
Perhaps, but even if it is so, this is only a small part of the problem.

 > In this context, let me also say that the notion of "tag soup" markup
 > is a myth. All versions of HTML, including HTML 5.x, can be parsed by
 > SGML using well-known, formal rules for tag minimization/inference  
 > other shortform syntax (save for the definition of the "script" and
 > "style" elements which were bogusly introduced into HTML such that
 > they would be treated as comments by legacy browsers).
I disagree, the soup is the excessive creation of unjustified tags,  
not impossible to parse tags. All the presentational markup tags are  
part of the tag soup. While it filled an unanswered need, since there  
was no adequate stylesheet mechanisme when Netscape 2.0 was released  
(Netscape 2.0 was released in 1995 and CSS1 was released in 1996) it  
gave the terrible result of mixing markup and presentation. The W3C  
answered with seperate versions including and excluding the  
presentation markup being the Transitional and Strict versions,  
starting with HTML 4.0. Internet Explorer was just as guilty, with  
elements such as the marquee element and so on. Many elements  
introduced by the browser vendors should never have been introduced.

 > SGML has additional concepts on top of XML very relevant
 > today, such as custom Wiki syntaxes (lots of tech content is written
 > using markdown today), type-safe/injection-free HTML-aware  
 > etc. And SGML also offers a formal way to integrate HTML content into
 > "canonical" markup (ie XML) [3].
While, from what I have read (since, unlike XML, SGML is a language  
whith which I have no experience), it is true that some capabilities  
from SGML (in DTDs particularly) were lost when creating XML, which  
may be a weak point of XML, it is also important to note that XML has  
at least three major advantages over SGML. The first is a stricter  
syntax, in XML general identifiers and attribute names are case  
sensitive, they are no elements with optional closing tags and so on;  
this allows for more efficient parsing and teaches more rigorous  
syntax writing to the users. The second advantage is the addition of  
a second verification in the form of a well-formedness requirement as  
a complement to the validation, validation which is not always used,  
some daughter languages, such as XSLT, while XML-based are non- 
validating yet they can still be verified for their well-formedness.  
This dual layer verification is a huge advantage with XML, where the  
absence of well-fromedness is simply not allowed. The absence of well- 
formedness requirements with HTML was part of what made it so  
degenerated; the HTML generating public would take advantage ot the  
actual behaviour of web browsers even when it violated syntactic  
rules, often geared to the behavior of a single browser, the other  
browsers would then try to make those pages work in their browser,  
most often introducing new quirks along the way; and the end result  
would be a vicious cycle of browser vendors always trying to make  
more pages work, introducing quirks along the line, and HTML  
generating authors using more and more lousy markup based on the  
quirks introduced by the browser vendors, which would then lead to  
browsers being even more tolerant and so on. With XHTML, being XML- 
based, the well-formedness is verified before displaying the page and  
any error is brought-up, instead of trying to make broken pages work  
(making broken pages work gives the wrong message to the authors,  
that it is okay to make bogus pages, as long as the browser can get  
them working nonetheless). The third advantage of XML over SGML is  
that with XML comes a complementary language in the form of XPath,  
which is used to express everything which cannot properly be  
formulated with pure XML, the XML/XPath combination is extremely  
strong. This strong combination allows creating languages such as  
XSLT, Schematron, XForms, and so on. If some features, available in  
SGML, are badly missing in XML, it is probably best to create an  
XML2.0 adding the features in a manner compatible with the remaining  
of the language rather than switching to SGML. Also, with XML, using  
Schematron allows data verification at a level way beyound anything  
SGML DTDs will ever allow. As for the markdown issue, the idea with  
XML is to use specific languages for specific tasks and attach an  
XSLT stylesheet to convert, on arrival, the content to the final  
format, being it XHTML, SVG or whatever.

 > XML-based standards from OASIS, such as Docbook
 > Just fyi, Docbook was as an SGML-based vocabulary most of the time;
 > it's only since version 5 that dedicated SGML-based formulations have
 > been dropped from the spec (since XML is just a subset of SGML
 > anyway). I agree though OASIS (fka SGML/Open group) has put out  
 > standards, and is an org I'd love to see helping to bring our
 > stagnation to an end.
Well, since Docbook is now an XML based format, it can serve as a  
basis for further XML efforts. Moreover, as you state, OASIS is an  
organization which can help to further the XML effort. If only they  
could start work on an "docarticle" format, whith support for  
comments and hypertext links via extensions. This could form the  
basis of a reborn web, based on XML.

 > replace the current selectors with XPath based selectors [...] the  
inconvenient (sic) of not being fully XML/XPath based [...] XML  
reformulation of [...] CSS3
 > I can only recommend to look, once in a while, at techniques outside
 > the XML-centric world.
A fully XML based solution allows using standard XML tools for  
manipulation, including XSLT.

 > Python, [...], Ruby, [...], ISLisp [...]
 > I'm sorry but this merely reads as a list of personal preferences.
I have stated why I suggested those. Python is becomming the main  
interpreted programming in the Unix world, Ruby is the main  
competitor to Python and for ISLisp, I already stated that
 > those programmers who do not identify with the unix culture
 > often are adepts of Lisp and ISLisp is lightweight (and as
 > such better suited to this use case) and consists of the
 > common subset of the major Lisp variants,
If you have a better list to suggest, by all means, please do so.  
What I am trying to say is that a remote software execution platform  
should break clean with the WWW legacy, use real programming  
languages, both interpreted and through bytecode.

 > There's nothing wrong with JavaScript; it's a language ultimately
 > derived from an awk-like syntax (so is very adequate for text
 > processing even though shortforms for pattern matching didn't make it
 > into the language), and is probably the most portable language out
 > there today.
Javascript had its beginning under the name LiveScript, which was  
introduced in Netscape 2.0 to add a bit of dynammic capabilities to  
web pages, particularly to the the page formatting, it was not meant  
to write software. With Netscape 3.0, it was extended and renamed  
javascript, it borrowed concepts from the Java programming language,  
with the major difference that Java is class-based and javascript is  
prototype-based. It was meant to be easy to use by non-programmers  
(and it succeeded in being so), which most web authors were expected  
to be, and there is nothing wrong with that; but it was not meant to  
write software. Afterwards, it was extended several times, during  
which time Microsoft designed its own partly-compatible version  
called JScript. The common subset to the 2 scripting languages was  
standardized under the ECMA262 standard under the name ECMAScript.  
Instead of switching to the standard ECMAScript, as would have made  
sense, the Mozilla team, which inherited the Netscape legacy,  
continued to push javascript, extending the language with  
capabilities for uses less web-centric and more generic.

 > the language Mercury
 > Mercury is a fine statically-typed, restricted variant of Prolog, and
 > even used as implementation language for Prince XML, but if you want
 > to push for logical programming on document data, I'd recommend to
 > stick to ISO Prolog which has many, many implementations.

 > In fact,
 > basic Prolog (on a suitable term representation for documents) can be
 > used to implement a large subset of CSS selectors *and*
 > layout/rendering algorithms using constraint-based formulations.

As I stated in my first short reply, I am not suggesting to use the  
Mercury programing language for the XML-based structural and  
semantic, reborn web platform. I am suggesting to use it for the  
second proposed platform, which is that of remote software execution,  
and which should rid itself of its markup legacy (HTML and XML). On  
the first platform, that used for content oriented websites, XForms  
should be used for form validation, and other "programming" needs  
should be handled through either XSLT or through the combination of  
XML, XPath and XML-events.

The reasons why I included Mercury in the list are the following:  
first, as it would put a purely declarative language on the list as  
an alternative to the imperative or hybrid languages which would  
constitute the other choices;  Mercury allows the use of three  
declarative programming paradigms (logic, functional and the  
declarative sub-variant of object-oriented). Most purely declarative  
programming languages are either purely functional or purely logic.  
Besides Mercury, I have never heard of a purely declarative  
programming language supplying facilities for using declarative- 
object-oriented (as opposed to imperative-object-oriented as supplied  
in languages such as Java). Prolog which you mentionned above is  
meant for logic programming. If, while using Prolog, one wanted to  
also use the functional paradigm and the declarative-object-oriented  
(as present in Mercury) paradigm, it would likely require some  
homemade solution (possibly written using Prolog) beyound the base  
language. I am, however not suggesting that Prolog would necesary be  
a bad choice, just that a single paradigm language of this kind would  
bring more limitations than a triple paradigm such as Mercury. In  
fact, I strongly encourage you to suggest your own list of  
programming languages for the second platform, that of remote  
software execution with the reason for each choice. My list is just a  
mere suggestion. The second reason why I suggested Mercury is that  
unlike most purely declarative languages, it has a little bit of  
uptake in the industry while most declarative programming languages  
are used only in the academic world and research institutes.

 >> DRM is fundamentally wrong and constitutes a stupid and useless idea
 > I don't like DRM either, but just as with RDF, the question is how to
 > finance content creation when the only or primary income is ads  
 > than public funding. It's all too well-known that monopolization and
 > platform economy is what's happening in eg. the music "industry".
Your answer makes me think I should develop on the ugly three-headed  
monster concept, in my original message I wrote:
 > The current World Wide Web [Consortium], from a great
 > organization has turned into an ugly three-headed monster,
 > one head is the semantic web / XML / RDF people, the second
 > head is the WHATWG people trying to turn the web into a
 > remote application execution framework, the third and final
 > head is the copyright industry.
When it comes to content creation, there are people affiliated to two  
different heads. You state that the issue is the financing of content  
creation. I need to say that not all digital content is created and  
paid for the same way. Some content is created by university faculty,  
research institutes, governements and NGOs, that content doesn't rely  
on advertizing or subscriptions for its financing; the same can be  
said of content put up by non-media-related, non-copyright-industry  
related companies, such as, for example IBM putting up the  
description of the products and services which they have to offer;  
one can also add the amateur websites (which can often be as good or  
better than professionally produced content) and blogs. One can even  
add the few people who produce paid content but who are sensible  
enough to consider that once the customer has paid for the content he  
or she has unrestricted DRM-free access to it, said people who also  
don't wish to use the absence of structural information and semantic  
information as a form of DRM-lite (see my original message about  
this). All this type of content is perfectly compatible with the  
first platform proposal, that for structurally encoded and possibly  
sementically encoded content, based on XML and meant for content  
oriented websites. These content producing people can easely be  
affiliated with the first head. You say that the primary mean of  
financing is ads rather than public funding, but all the  
aforementioned content doesn't rely on ads or subscription for its  
financing. Contrarly to what you seem to imply, even if ads were  
definitively abolished, there would still be content available.

On the other side, there are the content producers who consider that  
they own their monetized content, who consider that they have the  
right to control the use of their content, hence want DRM. These  
people's vision is the antithesis of the open web which the XML-based  
approach, the semantic web and XPath/RDF is trying to achievve. These  
people are the third head. The proper thing to do about them and  
their content is not to compromize with them, which will invariably  
compromize the openness of the web and the core principles it should  
follow, but to keep them out of the web; having them go put their  
content somewhere else. Those people do not want, anyway, to make  
content openly available but, on the contrary, consider that  
"accessing their content is a service". This brings an association  
with the second head, that of the people trying to turn the web into  
a remote application execution framework, or, in other words, an  
online service platform. Since it has been established that there  
should be two separate platforms for the two uses (one for content  
oriented websites, the other for remote service access), it becomes  
clear that the place for usage-restrictions encombered content is not  
on the platform for openly accessible content oriented websites,  
therefore, it is best to have it on the remote service access  
platform; this is even truer when considering that the corporate  
media and the copyright industry are trying to turn access to their  
content into a service anyway. There is no use for markup on content  
where DRM disallows the very uses which the markup would have  
facilitated and there is no use for markup on content for which the  
markup was voluntarly misused to create a DRM-lite situation again  
disallowing the uses which the markup is meant to facilitate. As a  
final note, while it is true that a competitive market would be way  
better than monopolies and oligopolies as is current in the domain, I  
a afraid that there is little that a platform development effort /  
standardization effort, can do to fight against such a situation  
besides trying to stay away from proprietary technology and choosing  
technologies which allows easier indexing (such as the semantic web /  
RDF / RDFa) by many competing companies, sice monopolies and  
oligopolies stem largely from political elements rather than  
standardization or technical elements. As for DRM, I also want to add  
that it has never prohibited downloading of any kind. At the current  
time anyone with half a brain can download all DRM-protected ebooks,  
movies or music files by using IRC/Torrents/Overnet/ed2k regardless  
of the fact that the original version was DRM-protected.

 > I have no doubt that XML will live and prosper, but my point is that
 > XML is a means to an end, not an end in itself. The "end" towards
 > which markup should strive is to give a reasonable, canonical way for
 > editing, publishing, and preserving rich text for a layman (or
 > realistically, a power user) on the web and elsewhere. Ask yourself
 > what practical, real-world solutions are there left today based on  
 > for this purpose that could challenge eg. WordPress?
Anything published as HTML is plagued by loads of problems which I  
have addressed in my original message, the fact that the content is  
prepared with something such as WordPress makes the resulting content  
even less accessible. XML allow the content to be more easely  
indexed, more easely analyzed and more easely reused or further  
precessed due to its excellent markup.

 > Let me close by listing a couple of practical initiatives for  
 > us closer to that goal, rather than going all-in on an overreaching,
 > XML-centric roadmap that we've already seen failing on the web:
 > - register a new application/html (as opposed to text/html) IANA MIME
 > type for web apps such that pure markup sites not relying on
 > JavaScript can be found easily and preferably over script-heavy  
 > over time, make it bad form and penalize accordingly to serve
 > script-heavy sites as text/html
I personally believe that this is not a solution to the problem,  
however, it can help with the transition to something better. There  
should be a cleaned-up version of HTML5, or better XHTML5, separate  
from the full version, in the same way that the W3C created  
Transitional and Strict versions of HTML4 and in the same way that  
they creates XHTML 1.0, having more or less backward compatibility  
with HTML4, while preparing XHTML2.0 (which sadly never saw the light  
of day). The main weak point of this approach is of course that it  
doesn't erect the required iron curtain between the two platforms.  
This approach allows a content page to point via a hypertext link to  
an application based site, which should not be allowed. It also  
allows both types of results to be mixed in search engines and so on,  
all of which should be prohibited. It would still be an improvement,  
however, especially as a first step toward cleaning the whole mess.

 > - question the pay-as-you-go status of W3C spec work, and push for
 > public funding of HTML standardization (it's been a while that W3C  
 > published an HTML spec; their HTML page just links to the WHATWG HTML
 > "standard" HEAD on github)
Public instances and public bodies can and are often manipulated by  
monopolies or by oligopoly-backed lobbies in which case there isn't  
much difference compared to the corporations doing the  
standardization directly, in fact, it may have the sole effect of  
adding another administrative layer, making the process even heavier.

 > - work towards identifying a reasonable set of expected visual idioms
 > on the modern web (such as menus and other generic controls) for  
 > we want to have a declarative/markup-based rather than (or in  
 > to) a programmatic implementation
I am not sure what to think about this. I think that any effort based  
on HTML5 should put the emphasis on clean-up rather than extension.

 > - push/finance W3C or other body to publish formal specs for CSS
 > (which is where a lot of complexity sits in today's web); try and
 > define reasonable CSS subsets for document-oriented use cases; try  
 > establish forward-compatible CSS versions/levels a site can anounce
 > such that we can eventually see new browsers being developed
My answer will be the same than to the proposition to create a new  
internet media type for application oriented website, distinct than  
that of content oriented websites. I think it won't solve the  
problem, however, defining cleaned-up versions, separate from the  
full version. as a transitory measure would be a step in the right  

 > - for the same reason, push for proper *versioned* HTML spec  
 > rather than "living standards" (an oxymoron in more than one way).
Well, if cleaned-up version of HTML5/XHTML5 and CSS3 are published,  
it is obvious that these would need to be properly defined, fixed-in- 
time versions.

 > Maybe the community is welcoming to such efforts. I think that last
 > decade's SiliCon-dominated scene has definitely lost its appeal, and
 > there's a growing concern towards monopolies, platforms, and  
 > and the attention economy in general.
The 2010s are probably the most disgusting decade ever seen in the  
world of computing.

 > Not sure W3C is the proper recipient for what you seem to push for,
 > simply because W3C has been in the web standardization game for most
 > of its existence, yet wasn't able to prevent the demise of the web
 > (not out of bad faith or something). It's my opinion that, If
 > anything, if you want to see a big-time XML+RDF agenda of the scope
 > you envisioned in your original mail, you'll risk evoking a bitter
 > controversy over past decisions, at best an "a fortiori" reaction
 > (such as TBL's SOLID project), but realistically nothing at all given
 > that most of the things have been discussed to death in their heyday,
 > but failed on the web. In fact, I believe W3C should disorganize  
 > its current statue, and make room for others, if only to let die the
 > illusion of the general population sitting at the table when it comes
 > to define the future of the web. But anyway, I look forward to your
 > detailed reply.
Can you be so kind as to state what would be the proper recipient for  
the proposal? In fact, in my original message, I suggested replacing  
the current W3C with two new consortiums, one, a reborn W3C with a  
strong OASIS and IETF participation but keeping Sir Timothy and the  
key XML/Semantic Web/XPAth People, and the other, completely  
separated consortium to handle the remote software execution platform.

 > Not sure IETF as such is the right recipient either. After all, you
 > are free to create your own RFCs as you see fit. IETF hasn't  
 > (nor should it have) HTTP/2 and HTTP/3 with its scope creep/land-grab
 > of lower IP networking layers (which now turn out to be bogus eg.
 > Chrome dropping support for push resources), keeping controversial
 > features such as DoH in conflict with other RFCs. Leaving a situation
 > where new browsers and network stacks can only be approached by state
 > actors or very large corporations, which is exactly the kind of
 > situation that bona fide "standardization bodies" should strive to
 > prevent.
I am hoping that having the IETF participation can help keeping the  
projects sufficiently open.

 > I wholeheartedly agree with your opinion that web apps (as opposed to
 > content-oriented web sites) should use programming languages for  
 > own sanity rather than a mish-mash of markup and programmatic
 > techniques a la React (which otherwise I don't think is half-bad at
 > all; it just wish e4x rather than jsx had "won"). But to challenge
 > that, the best way would be to establish a new JavaScript component
 > framework that folks would want to use rather than approach this from
 > a standardization angle; easier said than done, though.
The approach which you suggest keeps a tightly linked content  
platform and application platform. For the sake of sanity, it is  
important to set an iron curtain between the two platforms. The  
remote software execution platform should completely give-up its web  

When I talk about an iron curtain, I mean that the remote software  
execution platform should be completely separate from the reborn web  
platform. It should not be accessed, and this is important, by using  
the same software as that used for the reborn web, it should not be  
based on the same formats, it should not use the same protocols and  
so on. Perhaps, it can even be meant to be accessed from different  
devices, about this I recommend reading again the section of my  
original message about DRM, user-owned devices and user-rented  
devices. About this last point, I wish to state that a user doesn't  
have much benefit in owning devices which become obsolete every few  
years, and, in such a case can just as well rent them.

The vision proposed is one where there are two different platforms  
available with nothing in common between the two and none keeps the  
legacy of the current web. Perhaps one platform can manifest itself  
as various subscription services where users subscribe to to the  
platform access which allows access to online services, gains access  
to edge computing services (included in the subscription), where the  
subscription possibly includes an access device and where the user  
subscribes to other paid or advertizing supported services available  
on the platform. Perhaps the other platform can manifest itself as an  
open access to (hand encoded or XSLT generated) content based on an  
XML format (possibly served through a SOAP-over-TCP protocol), where  
the access is through software running on the user-owned hardware and  
where most of the content is freely available for non-commercial use,  
where indexing and analyzing the data is easy and where there are no  
restrictions put by people who consider that they own the content and  
that they have the right to restrict its use. Of course, the HTML- 
base web as it currently exists should be killed once and for all.  
The two new platforms should break compatibility with the past and be  
incompatible between themselves. It would not be unreasonable to say  
that one platform would be a digital business platform and that the  
other platform, the reborn web, would be an open content sharing  
platform, even if this description wouldn't hold true one hundred  
percent of the time; after all, when Sir Timothy created the web in  
the beginning, at CERN, it was to allow open sharing of scientific  

The other point which I want to bring is that you seem to think that  
the resistance to a switch to XML/XPath and the semantic web is too  
big to do the change and that the money speaks in favour of  
maintaining the HTML5/Javascript/JSON nonsense. By seing it this way,  
you do not seem to take into consideration the fact that the people  
who are against XML/XPath and the semantic web (and who are pushing  
the HTML5 nonsense) are the very same people trying to turn the web  
into a remote software execution platform. If they are redirected to  
a new platform meant for remote software execution and the entities  
(which include most browser makers) and money behind them is also  
redirected to the remote software execution platform, then suddenly,  
there would be no more force behind the HTML5 effort and almost no  
one fighting against the switch to XML/XPath and the semantic web. If  
the people who want to turn the web into a remote software execution  
platform are given the opportunity to switch to a new platform better  
suited for remote software execution, with proper mechanisms to  
integrate edge-computing and supplying corporate requirements as  
standard from the beginning, including security mechanisms and, yuk!,  
copy-protection; they will hopefully do so and reap its benefits; the  
web environment will then be mostly free of XML/XPath and semantic  
web opponents, the switch to XML/XPath and the semantic web can then  
happen with little opposition. As already stated if a new remote  
software execution platform is to be created, it should be now, when  
edge-computing is to become important, so as to integrate it from the  
start, afterwards, the opportunity will be past. Of course, as I  
already stated, it is best to rid the reborn web of the old names  
(html, xhtml, http, etc.) to avoid raising false expectations, a new  
set of names would be best for the new technologies.

As an extra note, I see that you do not touch at all to the subject  
of the adequation of integrating standard mechanisms for the use of  
edge computing in the remote software execution platform, is this  
I do believe that the coming of widely-used edge computing is the  
very reason why the people trying to turn the web into a remote  
software execution platform / pushing HTML5/JSON/javascript may be  
willing to allow the schismm (into two new platforms) as a new  
platform for remote software execution may offer proper mechanisms  
for edge computing integration from the start, instead of having to  
pile another hack on top of the current pile.

RaphaŽl Hendricks

Received on Friday, 29 January 2021 10:45:56 UTC