Re: Internal DTD Examples Invalidate the RDF/XML Documents from Frank Manola on 2003-10-03 (www-rdf-comments@w3.org from October to December 2003)

From: Frank Manola <fmanola@acm.org>
Date: Fri, 03 Oct 2003 10:37:30 -0400
To: dennis.hamilton@acm.org
Cc: www-rdf-comments@w3.org
Message-ID: <3F7D89AA.4090805@acm.org>
Hi Dennis--

Comments interspersed below.

--Frank

Dennis E. Hamilton wrote:

> Hi Frank,
> 
> When I saw that the OWL RDF Schema uses exactly the same technique in a published RDF, I dug into this more deeply and asked a related question at 
> 
> 	<http://lists.w3.org/Archives/Public/public-webont-comments/2003Oct/0001.html>.
> 
> Here's my understanding:
> 
> 1.	RDF/XML requires that the XML be well-formed.  Yes.  And the presence of the XML declaration seems to be an assertion that what follows will indeed be well-formed XML (including having a single root element).
> 
> 2.	I did not know that a Document Type Declaration was required for RDF/XML.  I haven't checked the latest specification.  That is a surprising requirement, since it is not a requirement for XML 1.0 and in particular for XML 1.0 documents that are only intended to be well-formed.
> 
> It is also a surprising requirement since a Document Type Declaration can be taken as an assertion of validity.  A non-validating processor is not required to confirm it, but I had always taken it as a form of promise.   
> 


1.  A document type declaration *isn't* required for RDF/XML, unless you 
want to declare entities.  You're always free not to use entities, in 
which case you don't need to provide a document type declaration (there 
are examples in the Primer that don't use entities, and don't have 
document type declarations).  If that's somehow not clear, I'll be happy 
to explicitly say so in the Primer.

2.  People (or software) may take the presence of the document type 
declaration as an assertion of validity, but the XML specs don't say 
that.  There's nothing that requires a document type declaration to be 
used for validation;  that's one reason why there are non-validating 
processors.  As I've said, document type declarations can be used simply 
to define abbreviations, in XML that is not intended to be validated (or 
at least not validated by validating XML processors).  You may feel this 
is inappropriate, but this is part of the flexibility of XML;  it's not 
specific to RDF/XML.

3. To be valid XML, the XML must have a document type declaration, and 
the document must comply with the constraints expressed in that 
declaration.  If you submit XML without a document type declaration 
(like RDF/XML without entity declarations), or XML with a document type 
declaration that's not enough to fully define all the XML that's there 
(like the RDF/XML with entity declarations in the Primer) to a 
*validating* XML processor, and tell it to validate, you're going to get 
complaints.   But that's because you've told it to validate, in a 
situation where the XML language involved is such that it can't be 
validated by a validating XML processor.  You need to use some other 
validation mechanism for general RDF/XML (although as you note below, 
special cases can be defined that do work with DTDs).


> 3.	I think that is the disconnect for me. 
> 
> It is simply very peculiar to have a practice that involves using a Document Type Declaration that establishes a DTD for which there are no XML valid documents.
> 
> SOMEHOW, THE PRACTICE NEEDS TO BE MADE EXPLICIT.  It is weird to think that someone won't try the technique in the examples (I did), and it is even more startling to have an XML editor complain when fed the OWL RDF Schema. 
> 


You needn't yell.  I'm certainly willing to make the Primer clearer 
about what's going on here (that's why I asked if you thought further 
discussion was needed).  However, as I noted above I don't think this is 
as peculiar as you think it is, and I think it is something explicitly 
provided for in XML.  That is, XML explicitly provides for using 
document type declarations in documents that aren't intended to be 
validated.


> 4.	There are XML editors and other tools that will validate when a Document Type Declaration is present.  And there are other processors, such as IE6.0, that will not even display the XML if it specifies a Document Type Declaration and it is not valid.  There is something in the implementation of IE 6.0 that has it not fail with an RDF though.  (It will display the OWL RDF Schema at <http://www.w3.org/2002/07/owl> with no problem.  I'm afraid to ask what the XSL instruction is for, though.)
> 
> So I have to deal with all of the error messages if I don't make the Document Type Declaration one for which the RDF is valid XML.  Since validation comes before content, it is a hack to notice that the XML is for an RDF and programmatically operate differently with regard to validation.  I haven't checked to see what the various DOM processors do, but this has to be a stumbling block for some of them.
> 


I understand.  But part of this has to do with trying to validate XML 
that isn't intended to be XML-valid (and/or assuming that if it has a 
document type declaration, it is intended to be XML-valid).  This is 
part of the reason why the issue was labeled as having to do with 
"validating embedded rdf" (which is often a problem even if the RDF/XML 
doesn't have a document type declaration when a validating XML-processor 
is used, and the DTD isn't designed to be sufficiently "open" to other 
embedded XML).


> (By the way, I successfully did external DTDs for two RDFs I just wrote for a class. Once I got the hang of it, it wasn't too difficult.  But each external DTD is clearly specific to the particular RDF that I built.  I could do a generic DTD that would work across multiple RDFs on the pattern that I am willing to use, but it doesn't work for arbitrary RDFs and it is a pretty "loose" DTD since I used <!ELEMENT rdf:description ANY> as a way to get around a lot of customization fuss.)
> 


Right.


> -- Dennis
> 
> -----Original Message-----
> at <http://lists.w3.org/Archives/Public/www-rdf-comments/2003OctDec/0006.html>
> From: Frank Manola [mailto:fmanola@acm.org]
> Sent: Thursday, October 02, 2003 07:28
> To: dennis.hamilton@acm.org
> Cc: www-rdf-comments@w3.org
> Subject: Re: Internal DTD Examples Invalidate the RDF/XML Documents
> 
> 
> Hi Dennis--
> 
> The use of internal DTD subsets in these Primer examples was merely to 
> illustrate the use of entities as an abbreviation mechanism. [ ... ]
>  (the Primer examples are valid RDF/XML according to the W3C RDF 
> validator).  Moreover, it's perfectly OK to use internal DTD subsets to 
> define entities in XML you don't intend to validate (there's a 
> well-formedness constraint in the XML spec that covers this situation) 
> 
> <dhnote>Granted</dhnote>
> 
> and, technically, RDF/XML *without* a document type declaration isn't 
> valid, so it isn't exactly the introduction of an internal DTD subset 
> that causes the RDF/XML to be invalid.
> 
> <dhnote>
> Why?  Where did that come from?  
> XML doesn't require there to be a document type declaration.
> In light of the following, it is screwy to require it, and screwier not
> to visibly declare a convention that a non-validating XML processor must 
> be used for RDF.
> </dhnote>


I didn't say RDF/XML required a document type declaration.  In your 
original message, you seemed to have said that the introduction of these 
entity definitions suddenly made the RDF/XML invalid.  My point was that 
RDF/XML without a document type declaration isn't valid either.  What 
you're really observing, it seems to me, is that the presence of the 
entity declarations triggers validity checking in processors that would 
otherwise have let the RDF/XML go.

As I've said already, I'll be happy to clarify what's going on here. 
However, it seems to me it should have been clear all along that a 
non-validating XML processor (or a validating processor that you can 
instruct to only check for well-formedness) must be used for RDF.  After 
all, if you attempt to validate RDF/XML that has *no* document type 
declaration, a validating processor should complain;  part of the 
definition of valid is that there be a document type declaration.  Part 
of the problem here, it seems to me, is this notion that if there is no 
document type declaration, the intent is not to validate, and if there 
is a document type declaration, the intent is to validate.  It's one 
thing to express this intent implicitly by choice of what processor (a 
validating or non-validating one) you submit the XML to.  It's another 
thing when a normally-validating processor implicitly implements this 
assumption.


> 
> It's been known for some time that the current grammar of RDF/XML isn't 
> amenable to description in a DTD (or an XML Schema).  Producing such a 
> grammar was considered by the RDF Core WG, but it was decided that the 
> changes would be so extensive as to be outside the current WG's charter. 
>     This has been added to the RDF Core postponed issues list at:
> 
> http://www.w3.org/2000/03/rdf-tracking/#rdfms-validating-embedded-rdf
> 
> <dhnote>
> Right.  You will never get a generic DTD to handle the case where a tag QName
> is understood to be a shorthand for a URI based on a concatenation assumption 
> concerning namespaces.  I don't expect that XML Schema will get far with it
> either.  [This observation is not strictly true, in working with the basic,
> common vocabularies, but it certainly holds as a practical matter.]
> </dhnote?
> 
> We also received related Last Call comments on this subject labeled as 
> xmlsch-10 and xmlsch-12, which can be found at
> http://www.w3.org/2001/sw/RDFCore/20030123-issues/
> 
> <dhnote>
> Although I have great sympathy for xmlsch-10, that is not the concern that I raise.  Satisfying the concerns expressed in xmlsch-10 would obviate my concern, but I was assuming that the RDF/XML notation for abbreviating mappings to URIs and triples is a foregone conclusion.  What I am saying is that if you are not going to operate in the XML stack according to what other users of XML processors might expect when they see an XML declaration, you need to say so in a very clear way.
> 	Another way to put it in xmlsch-12 terms is, if for some reason RDF/XML is not going to be consistent with Colloquial XML, then you must say so.  My sense is that it is not, and the examples and practices are not that innocent.  It gives pause that XML 1.0 validators will claim deficiencies in XML documents that RDF processors will assert are [RDF] valid.
> 	I think the interoperability issues around the "stack" up through Web Services to the Semantic Web require a clear statement.  I would treat that as separate from the xmlsch-10 and xmlsch-12 comments.
> </dhnote>



I think it's always been reasonably clear that RDF/XML wasn't intended 
to satisfy validating XML processors (after all, there's never been a 
DTD, so how could it be validated?).  But I agree this is something that 
could be made more explicit.


> 
> Do you feel this issue needs additional discussion (in the Primer or 
> some other RDF spec)?
> <dhnote>Yes</dhnote>
> 
> --Frank
> 
> 
> Dennis E. Hamilton wrote:
> in <http://lists.w3.org/Archives/Public/www-rdf-comments/2003OctDec/0005.html>
> 
>>I have been reading over the current RDF Primer Working Document, 
>>
>>	<http://www.w3.org/TR/2003/WD-rdf-primer-20030905/>
>>
>>And I notice that the introduction of internal DTD subsets to provide entity definitions (e.g., for &xsd;) results in the XML document being [DTD] invalid.
>>
>>
> [ ... ]
> 
>>Dennis E. Hamilton
>>------------------
>>AIIM DMware Technical Coordinator
>>mailto:Dennis.Hamilton@acm.org | gsm:+1-206.779.9430
>>http://DMware.info
>>   ODMA Support: http://ODMA.info
>>OpenPGP public key fingerprint BFE5 EFB8 CB51 8781 5274  C056 D80D 0C3F A393 27EC
>>
> 
> 
> 
> 
>
Received on Friday, 3 October 2003 10:14:19 UTC