Re: URL better than FPI

On Mon, 21 Feb 2000, David Carlisle wrote:

> 
> Arjun Ray wrote
> 
> > No.  They're exactly the same.  The real problem is that, under
> > the current rules, a URI can't be the minimum data following the
> > PUBLIC keyword.  line--
> 
> Under whose rules? I can see _nothing_ in the XML spec that puts
> any constraints on the public identifier except that it consists
> of PubidChar.

Yes, you're right!  (I guess I'm too much of a dyed-in-the-wool
SGML-er;))

The real point, here, however, is that the XML spec doesn't include an
SGML declaration, even as a sample.  The WebSGML TC has one, and there
we have, lo!, FORMAL NO.

> So as long as the URI is encoded in (say) utf8 and then %HH encode
> any disallowed characters, it would appear that that would be
> usable as the public identifier (although it would break any sgml
> based system expecting an FPI)

Well, under the -ahem- rules, that would be the SGML system's fault.
(One place where *expecting* a FPI could matter is a catalog mechanism
allowing partial matches based on internal syntax and not checking
whether the F in FPI actually applies.)

> > That is, there should never be a need for a PUBLIC *and* a SYSTEM
> > identifier.
> 
> I agree, in an ideal world this would be true. But in XML as
> currently defined main point is that you _do_ need two: a
> canonical name and a system address

Yes, but there is no need to put the system address in a *document
instance* if the public identifier is there already.  When we're
talking about XML and the Web, I can't imagine that it woudn't or
couldn't be normal to assume that the canonical name *will* have a
system address *necessarily* associated with it.  Including a system
identifier can thus be at best advisory.  The normative resolution
should be fixed in the spec - i.e. the authoritative document which
promulgates the canonical name.  That, IMHO, is what we should want,
but...

The XML spec (on external entities, Sec.4.2.2, has this:

: An XML processor attempting to retrieve the entity's content may
: use the public identifier to try to generate an alternative URI. If
: the processor is unable to do so, it must use the URI specified in
: the system literal.

I believe this is the core of Dan's case, but, as I've argued, it
rests on the assumption that a SYSTEM identifier has the *function*
of a PUBLIC identifier. 

> XML does not mandate support for any particular catalog syntax or
> support for http. Thus if as Dan Connolly suggested XHTML mandated
> that all conforming XHML documents start 
> SYSTEM "http://www.w3.org/....."
> or
> PUBLIC "xxx"  "http://www.w3.org/....."
> 
> Then the end result would be that many (perhaps the majority) of
> validating XML parsers would not be able to even parse a conforming
> XHTML document.

How does this follow?  Sorry, I must be missing something.  If you're
talking about a contention to the effect that a doctype declaration
*must* use the minimized form to refer to an external subset, I'd tend
to agree.  But I don't see why a validating parser would necessarily
fail just because a http: URL had to be dereferenced.  Could you
clarify?

> In a section on conformance you should restrict yourself to
> features that you know are available in a conforming XML system.
> Unfortunately that means for XML the _only_ thing you have
> available is to suggest editing the document so that the system
> identifier points to a copy of the dtd usable on your system. That
> means, if you want to also have a canonical name in the doctype
> declaration, XHTML has to use the only other available slot, which
> is the public identifier.
> 
> This is the main reason why I think XHTML has to use the public
> identifier, it is nothing to do with the merits of FPI versus
> URI, it is just to do with the lack of a mandated standard
> resolution mechanism for external identifiers.

Excellent summary.


Arjun

Received on Monday, 21 February 2000 23:31:17 UTC