- From: Arjun Ray <aray@q2.net>
- Date: Mon, 21 Feb 2000 23:57:52 -0500 (EST)
- To: www-html@w3.org
On Mon, 21 Feb 2000, David Carlisle wrote: > > Arjun Ray wrote > > > No. They're exactly the same. The real problem is that, under > > the current rules, a URI can't be the minimum data following the > > PUBLIC keyword. line-- > > Under whose rules? I can see _nothing_ in the XML spec that puts > any constraints on the public identifier except that it consists > of PubidChar. Yes, you're right! (I guess I'm too much of a dyed-in-the-wool SGML-er;)) The real point, here, however, is that the XML spec doesn't include an SGML declaration, even as a sample. The WebSGML TC has one, and there we have, lo!, FORMAL NO. > So as long as the URI is encoded in (say) utf8 and then %HH encode > any disallowed characters, it would appear that that would be > usable as the public identifier (although it would break any sgml > based system expecting an FPI) Well, under the -ahem- rules, that would be the SGML system's fault. (One place where *expecting* a FPI could matter is a catalog mechanism allowing partial matches based on internal syntax and not checking whether the F in FPI actually applies.) > > That is, there should never be a need for a PUBLIC *and* a SYSTEM > > identifier. > > I agree, in an ideal world this would be true. But in XML as > currently defined main point is that you _do_ need two: a > canonical name and a system address Yes, but there is no need to put the system address in a *document instance* if the public identifier is there already. When we're talking about XML and the Web, I can't imagine that it woudn't or couldn't be normal to assume that the canonical name *will* have a system address *necessarily* associated with it. Including a system identifier can thus be at best advisory. The normative resolution should be fixed in the spec - i.e. the authoritative document which promulgates the canonical name. That, IMHO, is what we should want, but... The XML spec (on external entities, Sec.4.2.2, has this: : An XML processor attempting to retrieve the entity's content may : use the public identifier to try to generate an alternative URI. If : the processor is unable to do so, it must use the URI specified in : the system literal. I believe this is the core of Dan's case, but, as I've argued, it rests on the assumption that a SYSTEM identifier has the *function* of a PUBLIC identifier. > XML does not mandate support for any particular catalog syntax or > support for http. Thus if as Dan Connolly suggested XHTML mandated > that all conforming XHML documents start > SYSTEM "http://www.w3.org/....." > or > PUBLIC "xxx" "http://www.w3.org/....." > > Then the end result would be that many (perhaps the majority) of > validating XML parsers would not be able to even parse a conforming > XHTML document. How does this follow? Sorry, I must be missing something. If you're talking about a contention to the effect that a doctype declaration *must* use the minimized form to refer to an external subset, I'd tend to agree. But I don't see why a validating parser would necessarily fail just because a http: URL had to be dereferenced. Could you clarify? > In a section on conformance you should restrict yourself to > features that you know are available in a conforming XML system. > Unfortunately that means for XML the _only_ thing you have > available is to suggest editing the document so that the system > identifier points to a copy of the dtd usable on your system. That > means, if you want to also have a canonical name in the doctype > declaration, XHTML has to use the only other available slot, which > is the public identifier. > > This is the main reason why I think XHTML has to use the public > identifier, it is nothing to do with the merits of FPI versus > URI, it is just to do with the lack of a mandated standard > resolution mechanism for external identifiers. Excellent summary. Arjun
Received on Monday, 21 February 2000 23:31:17 UTC