Re: Straw-man XML Support for PUBLIC Identifiers and XML Catalogs

I am CCing my response to a query from Michael about 9070 identifiers in
case it is of interest to the rest of the list.

   Of course the only reference of record is the standard: I do have a
copy, though. I've attached some very old URN ramblings about FPIs below.
In brief, and by example, the basic syntax for ISO 9070 Object Identifiers
is a::b::c//d::e::f::g
Left // separated field is the name issuing authority. This is confusingly
called the "Object owner", but it identifies the owner of the _name_ (not
the object). Any "owner" of a prefix can delegate a subspace of their
namespace to others, so if I own "a::b" I could have assigned "a::b::c" to
you for your Object Identifiers. The "Object name" (the part following the
//) has similar hierarchical identifiers, but no semantics or
administrative procdedures attach to them.

The bug in the 9070 standard is that SGML FPIs map all the fileds other
than the authority into successive positions of the Object ID. So if I use
an FPI like:
    -//SUN::SUNSOFT//DTD My weird DTD//EN
It becomes:
     SUN::SUNSOFT//DTD::-::My Weird DTD::EN
This is fine, but if I use :: separated items in organizing the object IDs
of an SGML FPI, things get messed up:
    -//SUN::SUNSOFT//DTD dgd::dtds::weird-1//EN
And I can't extract the Language and version specifications dependably:
They may be the last, or the last 2 items -- no marker as to which. And,
since the count of fields is no longer fixed, I can't do it by counting

   Now this conversion is defined in an informative annex, so we need not
be bothered by compatibility with this for XML, should we decide that this
has any relevance to XML.

    One thing I really like about the 9070 syntax is that it is simple and
general, lacking many of the odd mandatory fields in 8879 FPIs. On the
other hand it is a different syntax, not in wide use.

I've attached some old notes on 9070 I made for the URN list, years ago.
They duplicate some of this, but touch on a few points not given above.

  -- David

Attached archival material:

   As an "SGML guy" (though not the sort of SGML-zealot one sometimes sees,
I hope), I have also been wondering this. Particularly given the limited
use of SGML in WWW, the ISO naming stuff which is integrated with SGML
seems a natural (at least for that application). Here's some possibly
relevant info:

   The ISO FPIs are defined in ISO 9070. They are based on a two part structure:
   + naming authority
   + object identifier
   Each of these can be split into multiple hierarchical parts (allowing
for complex object names and delegation of naming authorities). Root
authorities can be assigned based on ISBN publisher numbers at the moment.
The character set that can be used is case-insensitive and also restricted
to be highly portable across national character sets. The syntax is
character-based, rather than simply describing a sequence of octets.

    There are only two objections to the FPI standard as far as I can tell,
based on the URN requirements doc. The first is a somewhat ugly syntax:
"//" to delimit the two major parts, and "::" to delimit fields within each
item. The second is an arbitrary length restriction 100 chars for owner
name, 100 chars for object name -- seemingly chosen so that the
corresponding SGML identifiers come to less than 250 characters.

    This last restriction is something that could be changed pretty easily
through the ISO, I think, especially as the harmonization with the internet
would appeal to the part of ISO that developed the FPI standards. (i.e. it
was _not_ developed as part of OSI).

It seems that the syntactic flexibility of the FPIs is sufficient to handle
any naming needs I've seen proposed here, and the hierarchical authority
assignment should be pretty scalable and decentralizable.

9070 has a provision for the use of ISBNs themselves as a root authority.
This was a revision to the standard in the second edition (ISO/IEC
9070:1991(E)). This means that there is a non-ISO based authority to assign
names. This is important since ISO 9070 naming authority (to be
administered by ANSI) is not yet in operation. I have been told that the
ISBN people are prepared to issue ISBN publisher numbers to anyone wishing
to pursue electronic publication.

The ISO 9070 character set is Upper/lower case, digits and "'()+,-.:=?/".
The standard is defined in terms of a "character repertoire" not a
particular encoding, so that national issues are a protocol rather than a
naming issue. This lowest common denominator make make the "name" aspect of
formal public identifiers less meaningful to Europeans and non-roman script

Comparison rules (to determine sameness of named objects) are defined by
9070. I am not sure if they are case-sensitive or not, since I'm lacking
access to some reference materials at the moment.

>2) Public identifiers have a separation between owner-name components and
>object-name components which has no equivalent in object identifiers.
>(This separation may well prove artificial and lead to errors.)

This may also, rather than a drawback, prove critical to enabling easy
support for different methods of encoding object names, distinct from the
issuing authorities.

I am not a number. I am an undefined character.
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________