Re: possible compromise on rdf:XMLLiteral from Jeremy Carroll on 2003-12-19 (www-webont-wg@w3.org from December 2003)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Fri, 19 Dec 2003 18:42:46 +0100
To: www-webont-wg@w3.org
Message-Id: <200312191842.46677.jjc@hpl.hp.com>
Sandro:
>I worry about the implementation burden here.  Jos, and other
>implementors of OWL Full consistency checkers: do you plan to
>implement support for XML Literal?  I'm not clear anymore what work is
>really entailed here.  I heard on the call that c14n equivalence was
>no longer needed, but that well-formedness-checking was.

>> b) Modify test miscellaneous-205 by deletion of the word "Full" from its
>> levels box.
>> Corresponding modifications to the manifest file for the test, and the
>> master manifest file.

Sandro:
>I'd think another test should be added which is "Full" only and has
>the opposite conclusion.
 
that's misc-204, the related test, except it also applies for Lite for systems 
that support rdf:XMLLiteral. misc-205 is inapplicable to Full not because the 
test result is incorrect but because the test metadata violates RDF semantics 
(by suggesting a datatype map which does not include rdf:XMLLiteral).

With the changes made by  RDF Core rdf:XMLLiteral acts just like a subdatatype 
of xsd:string with a rather bizare characterization of its lexical space - I 
doubt you could do a regular expression for it ... All the canonicalization 
stuff is done by the parser. The lexical work that has to be done by a 
reasoner is to confirm whether or not the lexical form is exc-canonical XML 
here is some of the relevant Jena code:

   /**
     * Test whether the given string is a legal lexical form
     * of this datatype.
     */
    public boolean isValid(final String lexicalForm) {
        /*
         * To check the lexical form we construct
         * a dummy RDF/XML document and parse it with
         * ARP. ARP performs an exclusive canonicalization,
         * the dummy document has exactly one triple.
         * If the lexicalForm is valid then the resulting
         * literal found by ARP is unchanged.
         * All other scenarios are either impossible
         * or occur because the lexical form is invalid.
         */

 .... set up code ... including 
       final boolean status[] = new boolean[]{false,false,false};
        // status[0] true on error or other reason to know that this is not 
well-formed
        // status[1] true once first triple found
        // status[2] the result (good if status[1] and not status[0]).
       
  ... and ...
      public void statement(AResource a, AResource b, ALiteral l){
                /* this method is invoked exactly once
                 * while parsing the dummy document.
                 * The l argument is in exclusive canonical XML and
                 * corresponds to where the lexical form has been 
                 * in the dummy document. The lexical form is valid
                 * iff it is unchanged.
                 */
                if (status[1] || !l.isWellFormedXML()) {
                                status[0] = true;
                        }
            status[1] = true;
            status[2] = l.toString().equals(lexicalForm);
        }
 ... more set up code ...
       arp.load(new StringReader(
        "<rdf:RDF  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>\n"
        +"<rdf:Description><rdf:value rdf:parseType='Literal'>"
        +lexicalForm+"</rdf:value>\n"
        +"</rdf:Description></rdf:RDF>"
        ));

http://cvs.sourceforge.net/viewcvs.py/jena/jena2/src/com/hp/hpl/jena/datatypes/xsd/impl/XMLLiteralType.java?view=markup


For systems that only accept RDF/XML as input this check needs to be done only 
for constructs such as
  <eg:prop rdf:datatype="&rdf;XMLLiteral">foobar</eg:prop>

For rdf:parseType="Literal" the parser has to do the work.

(Of course the Jena approach of creating a dummy document is somewhat 
inefficient ..., except in programmer time - I was the programmer)

Jeremy
Received on Friday, 19 December 2003 12:43:24 UTC