Name namespace, namespaces and names in general from Jonny Axelsson on 2000-02-09 (www-html@w3.org from February 2000)

From: Jonny Axelsson <jonny@metastasis.net>
Date: Wed, 09 Feb 2000 11:25:37 +0100
To: www-html@w3.org
Message-Id: <3.0.6.32.20000209112537.007d69a0@mail.linpro.no>
____________________________________________
1. FORM CONTROLS, NAME: SCOPE AND UNIQUENESS
 
I am a little confused about the uniqueness of the name attribute in
different forms on the same HTML page. NAME for form controls have a
grouping function, a bit of ID and a bit of CLASS. Any control inside the
same FORM with the same NAME belong to the same group. Two controls with
the same NAME inside two FORMs do not:

HTML 4.0 standard section 17.2, Controls 
A control's "control name" is given by its name attribute. The scope of the
name attribute for a control within a FORM element is the FORM element.

I presume from this, and from the lack of contrary evidence, that having
the same NAME in two different FORMs is not only legal, essensially NAME
has a separate namespace for each FORM. This is unlike every other
namespace that is unique for the entire document (URI). It is also
problematic, since references like [HTML401, sect 12.2.3] implies an
equality between the ID namespace, the A NAME namespace and the
<formcontrol> NAME space.

(cite: originally aired:
<http://buzz.builder.com/cgi-bin/WebX?14@179.wMviaFO9iQX^27@.ee7e11d/24>)


________________________________
2. NAMESPACE INSIDE A SINGLE URI

Speaking of namespaces and scope, the ID namespace inside a single URI is
flat, no two elements anywhere may have the same ID [HTML401, sect 7.5.2].
In XML, unique ID is a validity constraint [XML10, sect 3.3.1]. I am not
proposing a change to this, but what would happen if the mapping URI# - ID
was indeed hierarchical? For instance like this:

<body>
<div id="intro">
<h2 id="first">Intro.first</h2>
<a id="pointer1" href="#first">Go first</a>
<a id="pointer2" href="#second">Go second</a>
</div>
<div id="partI">
<h2 id="first">partI.first</h2>
<h2 id="second">partI.second</h2>
<a id="pointer3" href="#first">Go first</a>
<a id="pointer4" href="#second">Go second</a>
</div>
<a id="pointer5" href="#first">Go first</a>
</body>

pointer1 and pointer5 would point to Intro.first, pointer3 to partI.first,
pointer4 to partI and pointer5 to either:
A: NONE (there are no #first in scope)
B: UNDEFINED (ambigous, two equal candidates for #first)
C: Intro.first (the first #first)
D: partI.first (the last #first)

Alternative A would be "OOHTML", but break all current HTML pages, and
generally be a pain for everyone involved.
Alternative B would be like the current namespace, except that pointer1 and
pointer3 would be defined (pointer2 and pointer4 would always be defined. 
Alternative C or D would mean that every href would be defined if the
corresponding ID at all exists in the URI.

Alternatives B-C would give an "ID event hierarchy". 
1. Is the ID inside my content?
2. Is the ID inside my containing element?
...
N. Is the ID inside the root element (HTML)?

The difference between alternatives B, C and D is what "the ID" is. Given
alternative C, these two HTML bodies are equivalent:
  HTML document    Alt. B           Alt. C           Alt. D
  <body>           <body>           <body>           <body>
    <p id="id1">     <p>              <p id="id1">     <p>
    <p id="id1">     <p>              <p>              <p id="id1">
  </body>          </body>          </body>          </body>


This will have a cost at href resolution as every containing element has
its own namespace which increases the max number of lookups from one to max
number of containing elements (rarely more than five). There is no way to
dynamically change an ID (is there??), an ID of "here" will always remain
id="here", so when the lookup tables are made, they will never change.
Neither do I think it should have an adverse effect on DOM (but the element
reached with any id handle could be different from what it would have been
using a flat namespace).

There are some benefits with a hierachical namespace scenario. Here are two
scenarios:


Case 1: "Semantic" IDs (With a view to a database)

Often the source of an HTML document would be structured tables, like from
a relational database. These tables would usually have their own keys,
unique IDs that are later easy to hook up to for other systems and often
ideal IDs. It is not the ultimate infosystem with RDF and all that, and you
might not always want to expose your internal db keys to the world, but in
general a simple mapping like this gets the job done, easily:

COL = dbFIELD, COL.CLASS = dbFIELDNAME
TR =  dbRECORD, TR.ID = dbRECORDID

The same way to do this in XML would be
<person id="Employee-ID">
 <field1>Content</field1>
 ...
 <fieldN>Content</fieldN>
</person>

The problem arises when two or more db tables (or "XML records") are on the
same page. It is possible that two TRs (or <person>s) would have the same
ID. Indeed given the nature of RDBMS, it is highly likely.


Case 2: Generated pages on demand

The web designer often has no direct control over the IDs used on a single
as the HTML can come from several sources. There are two cases in particular.
a) Server side includes (SSI)
b) "merge" pages for print version

Often a page consists of several HTML parts merged with SSI (it could be
header, footer, navigation bar...) while each part can have a clean
namespace, it is hard to guarantee that there will not be a namespace
collision when these namespaces merges. 

It is often relatively easy to have several HTML versions of the same
document. It can be a report split up in section for convenient on-screen
reading and in a single document for printout. Again when HTML merge,
namespaces may collide.


__________________________________
3. BAD ID: THE NAME GENERATION GAP

There is a highly restrictive subset of characters allowed in an ID, "ID
and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by
any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"),
colons (":"), and periods (".")." [HTML401, sect 6.2] The corresponding XML
rule is "A Name is a token beginning with a letter or one of a few
punctuation characters, and continuing with letters, digits, hyphens,
underscores, colons, or full stops, together known as name characters."
[XML10, sect 2.3] That is XML allows any letter, not just A-Z, otherwise
the definition is identical. A NAME by comparison is "cdata".

You cannot have an ID with the value "Here I am" (spaces), for for that
matter "Here%20I%am" (% is not alphanumeric) nor "Here&#20;I&#20;am"
("&#;", and the ID value isn't parsed anyway). Nor can you have an ID
beginning with a digit, so <tag id="1"> is not valid. Is this really
desired behaviour? What advantages are there to this?



Jonny Axelsson,
Net asset,
Metastasis design
Received on Wednesday, 9 February 2000 05:26:52 UTC