Re: unparsed entity

Rich Himes wrote:

>>   - if a document has a DTD that defines "unparsed entities" (links to
>>     images and such; yeah, I know people should use Xlink for that,
>>     but XML still allows it) then the internal name of the entity is
>>     arbitrary. Should it be renamed in the canonical form?
>
>Yes, and I believe DOMHASH handles this.

No, it does not rename unparsed entities.  Suppose we have the following
DTD.

<!NOTATION GIF SYSTEM "view.exe">
<!ENTITY foo SYSTEM "myimg.gif" NDATA GIF>
<!ELEMENT start EMPTY>
<!ATTLIST start src ENTITY #IMPLIED>

Now, let us consider the following document.

<?xml version="1.0"?>
<!DOCTYPE start SYSTEM "mydoc.dtd">
<start src="foo"/>

I think we can think of three different strategies for hashing.

1. Do not expand unparsed entity.
   In this case, the hash value of the root element is
   exactly the same as the hash of the root element of
      <?xml version="1.0"?>
      <start src="foo"/>

2. Expand the unparsed entity to URL+Notation Name+helper
   The hash value would be the same as the case of
      <?xml version="1.0"?>
      <start src="myimg.gif+GIF+view.exe"/>
   (or whatever separater character is used).

3. Expand the contents of the image file.
   In this case, the hash value would be as if it is of
      <?xml version="1.0"?>
      <start src="...(binary data)...+GIF+view.exe"/>

We believe the alternative 1 is the most desirable, because

a) It is very difficult, if not impossible, to identify
   unparsed entities using DOM API.  DOM 1.0 does not provide
   a streight way to tell which is an unparsed entity and
   which is a text string.

b) http://www.w3.org/TR/NOTE-xml-canonical-req states that
   "The specification shall not consider the canonicalization
   of unparsed entities (although a canonical document may
   still reference them)"
   (although I do not know the rationale behind this)


c) Even if the DOCTYPE declaration is missing, the
   hash value is unchanged.


--
Hiroshi Maruyama
Manager, Network Applications, Tokyo Research Laboratory
+81-462-73-4576, maruyama@jp.ibm.com
Also Associate Professor, Dept. of Computer Science, Tokyo Institute of
Technology
+81-3-5734-3953, maruyama@cs.titech.ac.jp

Received on Thursday, 1 April 1999 03:48:14 UTC