Re: Should encoding of site structure be standardized? from Sampo Syreeni on 2003-03-01 (www-html@w3.org from February 2003)

From: Sampo Syreeni <decoy@iki.fi>
Date: Sat, 1 Mar 2003 02:22:56 +0200 (EET)
To: Øystein Ingmar Skartsæterhagen <goystein_goy@yahoo.no>
cc: www-html@w3.org
Message-ID: <Pine.SOL.4.51.0303010156560.2076@kruuna.Helsinki.FI>
On 2003-02-28, Øystein Ingmar Skartsæterhagen uttered to www-html@w3.org:

>On the web as it is today, all information is to be structured into
>pages.

Is it, really? Only recently I argued that multiple unrelated topics might
be placed on a single page. It's established that multiple related points
can be placed on separate pages. What I'm thinking is, you might be
looking at all this from a perspective which not everybody shares.

>But almost all web pages logically belongs to a sort of larger group of
>information, normally called a site.

Do they, really? I think one contrary example is a page which belongs to
no individual page, because it's originally been published as a separate
work (project Gutenberg immediately comes to mind). Another is a one which
has purposely been published on more than one page (here, syndicated
newsfeeds are a perfect example). I would argue that there is a whole
wealth of information which only fits into the site metaphor if we force
it to.

>As far as I know, the abstraction of "sites" is currently only something
>the viewer of the web pages may make up based on "context hints" in the
>content of the page, for example a company name together with its logo
>displayed at the top of all the pages belonging to that company's site.

True. This sort of information is engineered from the start to live on a
"site". If the organization is tree-like, as it often is, it'd be quite
nice if there was a common syntax to represent the hierarchy. Early
Mozilla builds had a feature to represent sitemaps, which would be a nice
starting point. If the structure is something like a general graph, I'd go
with a generalized RDF representation.

Still, there's the idea of a "site" hanging behind the scene. I think it's
largely superfluous. If we present sitemaps of any sort, it shouldn't
matter where the final material comes from. Logical connections are
logical connections, no matter where the data actually lives. If one can
use a site map (a map of related things), it's perfectly sensible to
include stuff regardless of DNS naming and URI structure. Maps are maps,
so why bring the notion of "sites" into it?

>But shouldn't it be possible to unambigiously state that all those pages
>belong together?

It should. What I'm having a hard time following is the idea that pages
from separate "sites" (as dictated by the details of DNS delegation) is
what belongs together.

>Sholdn't each page only hold that individual page's logical content, and
>headings/logos that contain information about which site we are in be
>kept in one single document for the whole site, and rendered in every
>document which claims that it belongs to that site (or which the site
>claims that belongs to itself; I am not sure which way this sort of link
>should go, probably it has to go both ways).

That is sort of logical, except for the fact that most people do not have
a "site" in the current meaning of the word. I, for instance, have quite a
number of pages, organized in a nice hierarchy and sharing a common CSS
style. But it ain't a site, as far as domain names go. I don't have
control over the root of the server. Tying navigation and site
organization to control over servers would be a deathblow to me and many
others. It would also seem like a nasty violation of the principle that
URI's are URI's -- I think resources hanging of URI*s should be treated as
equals, regardless of the depth of reference.

I'm thinking, the robot exclusion protocol and certain PICS provisions
already got this wrong. They don't work when you're living under a server
you cannot control. That sort of thing isn't a part of the distributed web
architecture, in my mind. If we treated all URI's equally, they would be.
That's the aim as far as I can see.

>In my browser (Opera 7.0), the link elements for linking to previous and
>next page, home page, etc. are (if present in the current document)
>displayed as a sort of "navigation bar" right above the area where the
>body of the page is displayed.

As they are in Mozilla. The idea comes from far back, and it's nice. It's
too bad that there isn't a sensible way to fit the navigational paradigm
with fully distributed pages -- currently the navbar data lives in each
document served, instead of being centralized. Distribution is good, but
it's really, *really* bad when you have to update all those documents by
hand. Believe me, I'm maintaining a site with such navigation data, by
hand.

>This "site document" could also contain other information that applies to
>the whole site, for example a title, a short description, a heading or
>other content to be included in each page, etc.

I think this sounds like a perfect application for RDF sitemaps. They
could contain a wealth of information in excess of navigational structure
in a standardized form. I'm really at a loss to explain why the Mozilla
project dropped sitemaps so early on.
-- 
Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Friday, 28 February 2003 19:23:00 UTC