HTML should not be a file format, but an output format

Stephanos Piperoglou (spip@hol.gr)
Wed, 19 Mar 1997 00:11:58 +0200 (EET)


Date: Wed, 19 Mar 1997 00:11:58 +0200 (EET)
From: Stephanos Piperoglou <spip@hol.gr>
To: www-html@w3.org
Subject: HTML should not be a file format, but an output format
Message-ID: <Pine.LNX.3.95.970318231753.412B-100000@fenchurch.hol.gr>

It's late, I'm just in and reading my mail, the metadata thread in
particular, and I start thinking.

The recurring thread is what will cause features to catch on. Others argue
that appeal to marketing people will cause companies to use it. Others argue
that "Joe's Homepage" makers should understand how to use it. Others still
argue that simply the power of a feature and the completness of a proposal
is enough for it to survive. So I wonder, what is it that has to be done?

I remember reading an interview of TimBL which I remembered because I
happened to visit CERN in Geneva recently. He said, as I vaguely recall,
that he never expected people to write HTML. At that time, it struck me as a
rather strange statement; being a neophyte in the "the W3C is good, to hell
with browser makers, Standards are God, I want full SGML [and so on]" camp I
believed that HTML editors and WYSIWYG tools where the work of the devil
used by ignorant BLINKopheliacs and frame junkies to ruin the Web.

I have since gathered a lot of insight on the matter - mostly due to having
lurked on this list for the past year - and have a fresh opinion which you
MIGHT be interested in hearing.

A very large portion of the problems we have today with HTML and the Web
stem from that precise problem - HTML is written by hand, on a per-document
basis, by people. Why is this done? Mostly because most of today's web
servers are based around a philosophy of document=file. People are used to
having a Web Page (whatever that is in their minds) as a single HTML
document which they create independantly, upload to a server and have people
view.

The linguistic specifics of HTML - new elements, backards compatibility,
style vs. content, validation, internationalization and so on - are being
slowly but steadily solved. XML shows us a clear move towards full SGML (I
was VERY intrigued by the fact that this was one of the goals of the W3C as
stated in a stand about the Web I saw in CERN outside the cafeteria which I
had the chance to visit daily), as does CSS. One way or another, a few years
from now we will have abandoned HTML as we know it and moved on to something
(be it SGML or not) that offers a concise way of offering all kinds of
media in a document over a network. Big deal. That's the EASY part.

But the whole debate about LINK and metadata has nothing to do with this.
It's about HTML as a markup language for information. And as long as most
writers of HTML percieve it as a file format, a way to produce a static,
specific document to be published over a network, no one can be educated on
the value of metadata and hyperlinks.

Let me put it in simple terms: a friend of mine asked me why he couldn't set
up a hyperlink between two pages he created using Microsoft Word Internet
Assistant. The "easy" answer was "well, you're going to upload the page to a
Unix server and you have to make sure both files have correct permissions,
the HREFs are case sensitive and that MWIA doesn't automatically insert
an absolute URI pointing to a file on your hard disk which obviously could
not be accessed once your page is uploaded". That kind of response would
puzzle my friend enough to start with.

How about explaining to him that the hyperlink he's setting up has two
anchors, the head and the tail. That both are represented by a URI followed
by an optional fragment identifier. Your tail anchor should point to a
resource, which is served out by a server and should clearly be stated, be
unique and present.

He'd be running through the forest naked with his underwear over his ears by
now.

HTML has taken this role; the role of a document format suitable to
publishing over a network. Not the role of a markup language that, as its
name suggests, defines the links between snippets of information.

What the Web should be, in my opinion, ladies and gentlemen, is a place
where documents, designed in any way one wishes, served static or produced
dynamically, are linked together. Once HTML (or some descendant of HTML) is
used for THIS and ONLY THIS purpose, then people will finally come to
understand the value of these depreciated tags.

If I had my way, HTML would ONLY be anchors, links and metadata. That's what
makes the web the web; otherwise, we should be using FTP to transfer these
files around, and we would have come up with a much better way than B, I, Hx
and P to organize our information. And by doing this, people would
UNDERSTAND this kind of markup because they would have to in order to use
it.

Comments by anyone still reading thus far are more than welcome.

--
Stephanos "Pippis" Piperoglou - http://users.hol.gr/~spip/index.html
I've never finished anything I began, but this time I'm