Re: Understanding the "applicable specifications" clause (was: Re: Decentralised extensibility idea (ISSUE-41)) from Dr. Olaf Hoffmann on 2010-01-21 (public-html@w3.org from January 2010)

From: Dr. Olaf Hoffmann <Dr.O.Hoffmann@gmx.de>
Date: Thu, 21 Jan 2010 19:03:59 +0100
To: public-html@w3.org
Message-Id: <201001211903.59496.Dr.O.Hoffmann@gmx.de>
Tab Atkins Jr.:
>
> It doesn't mutate, because it isn't, by itself, an XHTML file.  It's a
> bag of bits.  

This is another abstraction level and a general problem of data
saved on hard discs and related things - even more, it is a
general problem of 'information', that it is not information by
itself, just be some cultural agreement how do encode/decode
information.

On a filesystem like ext3, reiserfs, vfat or something like this,
there is some agreement of which bits have to be combined
to a file somehow. Due to the sorrowful history of computers,
there is no agreement how to relate metainformation about
what the file represents, therefore practically this is somehow
inside the file. And once it is managed to encode that it is
'readable' text and there is an XML processing instruction
at the beginning, this is already some piece of metainformation
to distinguish - HTML has no such processing instruction, therefore
HTML can be already excluded. Other cultural agreements can
be found too for the relation of the processing instruction to XML
and to determine the namespace, then typically knowing the
namespace one can determine the language and the version
of the language - well this is it, not nice but in practice one can
do this step by step in a similar way as to distinguish between
hieroglyphs and celtic runes.

And then, if you want or there is some mandatory advice you
can try to interprete the hieroglyphs as celtic runes and vice
versa - whatever is required and whatever is the result.
If the server or the author insists on this, why not?
But this does not change the simple fact, as what language
it is noted in the case of XHTML (with the help of some minor
cultural agreement on how to read text files).

> It can be interpreted as XHTML, or HTML, or plaintext, 
> or a bitmap for that matter.  Files don't carry around an essential
> identity, they obtain one when you choose to interpret them in a
> particular way.

No, finally these are magnetic areas on a hard disc or electric
charges and there is some common cultural agreement up to 
some abstraction level, how to interprete these physical phenomena 
as information, it suddenly has its own identity as information, what
is lost of course if any information about the cultural agreement is
lost. But as long as this is not lost and there is some chain of
agreements, the file has some information as content and therefore
its own identity. 
The 'miracle' about such digital files is, that you can duplicate the
information - and this is what happens if something is served to the 
browser, either by a file-system or a server or whatever. And the 
duplicate has somehow the same identity as the original, just because 
it has the same information as content. The file-system or the
server sends additional meta-information and advices about
the file. This is additional information, not belonging
directly to the block of information what we can call the file.
And therefore it does not change the identity of the file.
Still we can save the file again and compare it to the original -
if nothing went wrong, we will find, that both contain still 
exactly the same information, whatever the metainformation
was and whatever happend with this metainformation. 
If this would not be possible, computers and internet would
not exist and information would be still written only on paper
or knocked into stones (and even this can be used to
save digital information and to duplicate digital information
without losses and changes - for example in punchcards and -tapes).
Even more exiting and appearing like a miracle is quantum information
with entangled states - but this is another issue, what could help to avoid,
that suddenly your money vanishes from your digital bank account without 
your agreement due to security holes in browsers or similar devices. 

>
> That's why there was never any such thing as "XHTML served as
> text/html".  It was always HTML, albeit with some slightly invalid
> syntax inspired by the XHTML syntax which browsers tolerated/ignored.
> If they served it as application/xhtml+xml, then it would have been
> XHTML.
>

If you do not believe in the identity of digital files, you should not
have such a digital bank account, only gold coins and diamonds - 
and even they depend somehow on a cultural convention that they 
mean something.


> Unfortunately, I think we've gone far down the rabbit hole of an
> irrelevant sidetrack, so it's probably good to stop now.
>
> ~TJ

With this I agree, would have been a better idea, to have left the
old fashioned HTML in the last millenium and to use XHTML in
this millenium without any dirty tricks for outdated browsers.
However, doesn't HTML5 do exactly the opposite, describing how
to interprete what looks like XHTML as HTML? Finally this seems
to 'standardisease' somehow the common bad practice to stir all
this up to one soup. 
I think the informative section in XHTML1.0 was only intended to
please users of netscape3/4 and msie3/4. It got only a permanent
desease, because after 10 years there are still browsers in use without 
much implementation progress compared to netscape4 and msie4 ;o)
With some more implementation work this could have been avoided
and there would be no need to 'standardisease' in HTML5 and one
could have been started with something up to date in XHTML2 or 3.

Olaf
Received on Thursday, 21 January 2010 18:18:28 UTC