Re: Permitting non-indirect links from David G. Durand on 1997-01-16 (w3c-sgml-wg@w3.org from January 1997)

From: David G. Durand <dgd@cs.bu.edu>
Date: Thu, 16 Jan 1997 12:16:09 -0500
To: w3c-sgml-wg@www10.w3.org
Message-Id: <v02130506af040b65fdf2@[205.181.197.81]>
At 5:48 AM 1/16/97, Martin Bryan wrote:
>David Durand wrote:
>>>There are two problems with this appoach. One is the need to define entities
>>>for each part of each URL in the BOS or document, and the second is that
>>>each link has to look different so my second link has to read:
>>><a href="http://&base-server2;/&rootpath2;/&that-subject-path;/&that-doc">
>>
>>They only have to look this different if it uses a different server,
>>different root path, different subject path, and different document path.
>
>I was just making it clear to other readers that the entity names you gave,
>which looked very nice for the first example, are not reusable for every URL
>as entities cannot be redeclared within docs. The result is that by the
>100th source of files you get some pretty wierd entity names, and a chance
>of clashing with other entity names in the DTD.
>
>>Declaring entities is no harder than the extra elements that the HyTime
>>approach would require.
>
>I'm not saying it would be - remember that I have said before that entities
>in a BOS are a good way of assigning referencable names to files. The same
>applies to naming reusable fragments of file identifiers, if you are willing
>to live with parsing the entity declarations. The problem is that our
>DTD-less browser would be required to parse the entity declarations to get
>the URLs. I thought we wanted to avoid this.

Well, I want to avoid it because of the extra file-transfers, but if we are
indirecting anyway, there is no problem there. And in any case, all of this
can be done in the document instance, by putting the entity declarations
into the internal subset.

And if the internal subset declares and references a parameter entity with
the redirected entity definitions, you can have the indirection, too!

>>I don't understand this sentence. The above example has a seriws of
>>hard-wired values, and seems utterly equivalent to the following:
>><a URL="http://www.echo.lu/"oii/en/alpha.html#X>
>
>The difference is that I can change one part of the hardwired string without
>changing the whole string: therefore changing all files that have moved
>directory or moved server becomes a lot easier.

I still don't understand. The global change that is going to update all the
occurrences of /oii/en/, could update a URL or an attribute value just as
easily. (Or the editor function could do it, doesn't matter).

One way you're updating a substring in a URL everywhere, the other way,
you're updating an atribute value everywhere. The difference in convenience
seems marginal.

>>I see the breakup of the URL into attribute values, but I don't see that
>>this does anything but save a programmer from writing a 50-line URL parser.
Link management doesn't seem to change on whit to me, just the work of the
parser writer. And most people (not you) recycle URLs via cut-and-paste
from a browser -- they would be _dis_-advantaged by this approach.

>> example of indirection using entities deleted. What's the real difference
>> between the two?
>
>Only in that it points to id name space rather than entity name space. Id
>name space is created in the document instance, which must be parsed by the
>browser, whereas entity name space is created in the DTD, which is not
>necessarily parsed by the browser.

Internal subsets can be. If you are willing to do this indirection, the
internal subset is a small price to pay. I'm also now convinced that the
internal subset should always be parsed for entity and attribute
declarations, anyway, as I'm arguing in a separate thread.
>><a dest="url-chain">
>><location id="url-chain" locsrc="chain1" url="#x">
>><location id="chain1" locsrc="chain2" url="alpha.html">
>><location id="chain2" locsrc="chain3" url="/oii/en/"
>
>The problem is that the last location address is wrong as far as HyTime is
>concerned as it does not point to a referencable entity. We would need DNS
>and pathnames to be allowed as valid "entity-source-identifiers" for this to
>work. There seems to be no reason why it should not, it just hasn't been
>envisaged to date.

These are all HyTime Queries, as any URL reference must be. HyTime does
_not_ require that all links use entities, as long as you are willing to
use some queries; Eliot and I are arguing that XML links _cannot_ make that
requirement.

>>We would need to define that relative URL semantics apply when a URL is the
>>locsrc for a URL. This makes sense.
>>
>>However, I don't like the HyTime solution that much, as locsrc has all the
>>decoupling problems of ilink, but offers less obvious value.

>The value of locsrc is its reusability. You only need to define it once and
>then reference it via the ID. This means you only have one entry to manage
>for each location source. This is a vast saving in management for heavily
>reused address components. It is also infinitely quicker to update, and
>generally a safer management approach.

Since we agree that the same reusability is also possible with entities
(although less-elegantly), do we now agree that we don't need locsrc?

>>As far as I know, the attribute value syntax that martin is proposing could
>>not be a valid HyTime location latter (though it could of course be a
>>query, since any markup at all can be defined as a query).
>>
>>So Martin's proposed markup is only HyTime compatible in the weakest way.

Martin agreed with me, but I must say that this was a cheap shot. The URL
links I am proposing are also queries, and the real advvantage is that they
use the syntax already poularized by HTML, and the semantics defined by the
URL standard (relative URLs are well-defined, and we don't need to explain
how they work().

> .... but the problem of name space management in a DTD-less environment
>require that we rely on names in instances rather than names in DTDs. My
>concern in all this is not whether it is technically feasible, but whether
>it is realistically manageable over a long time period. As far as this goes
>I will continue to play devil's advocate and suggest alternate ways of doing
>things that might have better management potential.

Is it feasible to manage if we make applications parse the internal subset?
This can be done as an option today via the RMD, though I think it should
simply be made a requirement. I really think that locsrc is not worth
implementing: the only thing that does argue for its feaibility is that
most of the code to handle "raisins in a pudding" markup will be there for
ilinks, so it could be re-used. But I'm strongly convinced that ilinks
offer amazing power we just don't get any other way, and I'm not so
convinced with respect to locsrc.

>But they cannot be used in catalogs: how do I move my URL locators to
>catalogs if they have entity declarations in them. Do I have to resolve them
>and then update them latter? Bang goes my management idea of leaving
>resolution of the URL until the point the file is actually called.

I think you can get equivalent behavior with external entities. I feel like
the ground keeps shifting under me, with new requirements added in every
time.

>>Now I know that entities are (over) used in SGML to make up for language
>>deficiencies, but I think that URL composition is actually a pretty natural
>>use of entities: certainly it would not take long to explain to authors.
>
>It would take no  longer to explain how my four/five attributes map to the
>URL syntax: they are a direct mapping of the contructs in the URL RFC!

yes, but your explanation will be in addition to the explanation already
required for the simple URL link, since we have agreed that we can't do
without that. So you are adding a bunch of new attributes that duplicate
the function of one attribute we already have, to add syntactic convenience
that is not essential because entities and the single attribute can produce
the same effects.

> And
>they do not change name for each URL locator, while the entity names would
>need to.

This is a red herring. Entities only change name when things are not shared
(as do the attribute values). The number of names you have to make up is
_exactly the same on the two proposals. Only the syntax of how they are
used changes. (entities get concatenated in a single attribute value),
multi-locsrc's (or whatever) get spread over several attributes.

>If you consider the HoTMetal URL creation menu verbose then how come
>HoTMetaL is a leader in the field? All I am doing is say store the info
>HoTMetaL creates in its separate boxes as separate attributes until such
>time as you call the file, instead of resolving the complete URL the minute
>you close the attribute menu.

I'm not sure how HoTMetal is doing against Front Page and its ilk, but
that's not really the question. Currently we have a single href attrribute.
We must continue to allow that simplicity. URL syntax may not be much good,
but it is _not_ going away any time soon. Any multiple attribute solution
will be _additional syntax_. If we are going to add syntax, it must enable
something that would _could not_ do without that syntax. All the arguments
we have had about convenience have not yet shown such an advantage.

>The fact that URL definitions using ID locators is more verbose than doing
>an entitiy declaration is irrelvant to users. The IDs and associated
>declarations will be generated automatically by the editor without user
>intervention. The question is whether browser recognition of entity
>declarations takes longer than browser resolution of IDs. If the entity
>declarations have to be read as part of the DTD I would contend that at
>browser level HyTime location addressing may be faster than entity resolution!

The code to support the two is _very_ different in size and complexity, and
the incremental benefit of adding that complexity is an arguable
improvement in the elegance of the markup, and no improvement in expressive
power. I'm still completely unconvinced.

  -- David

I am not a number. I am an undefined character.
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Thursday, 16 January 1997 12:09:09 UTC