Re: Encoding of site structure ... from David Woolley on 2003-03-08 (www-html@w3.org from March 2003)

From: David Woolley <david@djwhome.demon.co.uk>
Date: Sat, 8 Mar 2003 12:55:26 +0000 (GMT)
To: www-html@w3.org
Message-Id: <200303081255.h28CtRK02021@djwhome.demon.co.uk>
Øystein Ingmar Skartsæterhagen wrote:
 --- veith.risak@chello.at skrev: > 
> > You can see the web as nodes which are more or less
> > connected by links. This is the normal view.

I think this is the idealised view.

> > 
> > But you can see the web also as "clusters" which are
> > linked. (Links to single pages still exist) You

I think the pragmatic view is that the web is a small number of search
engines and portal sites linked to what Front Page calls webs, i.e.
really structured, incrementally downloadable documents.   I think
it is one of these structured documents that most commercial web designers
would call a site.  Front Page can get away with redefining web as most
people don't actually understand the concept of a world wide web, and
often think that the world wide web is the web of fibre optic cables,
or logical communications links, not the web of hypertext links.

> > could see these clusters as some sort of
> > "super-nodes" amd you can concentrate all incoming
> > links to these super-nodes. Than the structure looks

Although there may be some real multi-company/multi-server super-nodes,
most supernodes are there through design, rather than accident, as 
structured documents (and generally it is policy that they do not link
out of cluster, or only do so to a single entry point on one of the
owning company's marketing partners super-nodes).

Structured documents (and presentational ones, of the sort that commercial
web sites tend to be) pre-date the web, and, for example PDF has a concept
of forms, which covers the requirement specified here for a single download
of common, corporate image, features (even if using Distiller on the 
output of MSWord may not produce them in the document).  That's because 
PDF is designed for structured documents; the web was originally designed
for the links between loosely associated nodes (in which the problem of
determining clusters would be difficult).

PDF was slow off the mark in realising the importance of the internet, so
originally lacked out of document (super-node) links and the incremental
loading that is the byproduct of using technology intended for partially
standalone resources to create structured documents.  It now has URL
type links (although the user agents rely on a web browser) and it
now has incremental loading support (through HTTP byte ranges and
optimised PDF).  However, it always was oriented to commercial needs
(e.g.  technical enforcement of intellectual property rights, close
control of layout and styling, etc.) and I think represents a better
example of a structured, presentational, incrementally loaded document,
than the average commercial web site.

Coming back to HTML, there are I think two different issues here.  One is
the treatment of a super-node as a single entity for management.  That seems
to me to be an authoring tool job, and I believe that, for example, the
non-loss leader versions of Front Page provide site network diagrams
for the developer.  Whilst there is a case for standardisation here,
the pressures for it are less, so the advantages to tool developers of
a supplier lock in from a proprietory format tend to dominate.

The other issue was standard elements on each page (forms in PDF).  This
can be done server side (although the normal result is non-cacheable pages,
although this is not inevitable).  This wasn't in early HTML as it was
intended that people not be trapped within a super node controlled by a
single manager.  However, SGML, on which HTML is based, and XML both
implement this capability, in the form of external entities.  No HTML
browser supports this, and I don't think that there are any validating
XML browsers, which would be necessary to do this.

The rules for XML entities allow the browser to defer rendering until
the user requests rendering, but I suspect the intention here is more
to allow for low bandwidth connections, and would not,  I think, be 
expected as default behaviour for the sort of browser currently being
used with such stock code inserted server side.

> written in a way so that UAs doesn't have to read them
> and show at least the navigation (and preferably also
> other site-wide content, such as headers), then we
> still have to mess up the pages with including this in
> them all to make sure they are accessible to all.

This sort of thing was in very early in the form of link elements.  I
suspect the reason why link elements were largely ignored, or replaced
by non-linking ones, like meta, is that designers do not want the 
browser to control the presentation of such elements of the design.
However, note that Lynx and recent Mozillas will provide access to
link elements providing forward, next, up, down, index, ... type
links.  Of course, very few authors realise that these existed
from almost the beginning.

(I think it would be almost impossible to get authors to accept 
<link rel="SitePageHeader"> though, as it would deny their ability
to be different (even though this level of sameness is needed for
a semantic web).)
Received on Saturday, 8 March 2003 07:58:10 UTC