Re: notation for literals from Dave Beckett on 2002-04-25 (w3c-rdfcore-wg@w3.org from April 2002)

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Thu, 25 Apr 2002 11:39:17 +0100
To: Brian McBride <bwm@hplb.hpl.hp.com>
cc: Pat Hayes <phayes@ai.uwf.edu>, w3c-rdfcore-wg@w3.org
Message-ID: <12242.1019731157@tatooine.ilrt.bris.ac.uk>
>>>Brian McBride said:
> At 09:26 24/04/2002 -0500, Pat Hayes wrote:
> 
> [...]
> 
> 
> >"An RDF literal has three parts (a bit, a character string, and a language 
> >tag [@@reference@@]), but we will treat them simply as character strings, 
> >since the other parts of the literal play no role in the model theory."
> 
> 
> Where do we define literal equality?

Nowhere at present; we additionally don't define what a literal is,
according to our decisiions.  The existing MT talks about string
literals (since we postponed XML stuff till later) and uses string
equality.  The answer is captured in the issues list from various minutes.

I'll try to find the bits we decided.

Issue http://www.w3.org/2000/03/rdf-tracking/#rdfms-xmllang
(and Issue http://www.w3.org/2000/03/rdf-tracking/#rdfms-xmllang) say:
  [[
    a literal consists of three components:

    * A representation of the parseType, which is a single bit
    * A language indicator which is a string as defined in XML
    * A fully normalized UNICODE string.
  ]]
  -- decided in http://www.w3.org/2001/sw/RDFCore/20020225-f2f/#d-2002-02-26-1

(Aside: although I see no response from the issue raiser, timbl)


Issue http://www.w3.org/2000/03/rdf-tracking/#rdfms-xml-literal-namespaces
  [[
    * the exact form of the string value corresponding to any given
      XML Literal within RDF/XML is implementation dependent.

    * the string value is well-balanced XML

    * taking the exclusive canonicalization of both the original XML
      Literal in its containing document, and the string value of the
      literal produce the same character string. (this will be used
      as the basis for test cases)

    * the canonicalization above is without comments i.e. CONFORMANCE
      should be tested by canonicalizing without comments; comments
      may be included in the string representation of a literal

   ]]
   -- decided http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0235.html


http://www.w3.org/2001/sw/RDFCore/20020225-f2f/
"Notes from I18N/RDFCore Meeting"

 * 18N agree that RDFCore requires a transitive string comparison
   algorithm and requests that the specs do not mislead application
   developers into thinking they are not permitted to implement a more
   flexible string matching algorithm, e.g. on queries.

 * I18N found the proposed solution of literals being a pair of a
   string and a language tag acceptable.


Some discussion of the literal equality stuff proposed by John Cowan:

  [[
  Literals are equal iff:

  1) the strings are equal, and
  2a) at least one string does not have a tag, or
  2b) one tag is a prefix of the other within the meaning of RFC 3066
       (i.e. "fr"/French is not a prefix of "fry"/Frisian but is a prefix
       of "FR-CA"/Canadian French).
  ]]

  -- http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Feb/0653.html

but we backed off from the prefix stuff, allowing applications to be
more permissive on matching language tags while letting RDF just use
simple (case-independent) equality.

======================================================================

Trying to summarise

  An RDF literal consists of three components:

  * [literal-value]
    A fully normalized Unicode string.

  * [literal-is-XML]
    A representation of the parseType, which is a single bit.
    If the bit is set, the [literal-value] must be interpreted
    as serialized XML rather a sequence of Unicode characters.

  * [literal-language]
    A language indicator string

  Constraints

    On [literal-language]

      * Any allowed xml:lang content as defined in
        http://www.w3.org/TR/REC-xml#sec-lang-tag

    For XML literals; that is, when [literal-is-XML] is set.

      * the exact form of [literal-value] corresponding to any given
	XML Literal within RDF/XML is implementation dependent.

      * [literal-value] is well-balanced XML

      * Taking the exclusive canonicalization of both the original XML
	Literal in its containing document, and [literal-value]
	produce the same character string.

      * The canonicalization above is without comments i.e. CONFORMANCE
	should be tested by canonicalizing without comments; comments
	may be included in [literal-value]

   Equality

     If [literal-is-XML] is set

       Two RDF literals are equal if and only if

          Taking the exclusive canonicalization of both [literal-value]s
	  produce exactly the same sequence of Unicode characters.
       AND
          If either literal has an [literal-language] they must be present
          in both and identical strings (case independent comparison)

     otherwise

       Two RDF literals are equal if and only if

          Both [literal-value]s are identical
       AND
          If either literal has an [literal-language] they must be present
          in both and identical strings (case independent comparison)

    Implementors Note: It is recommended but not required that the
      case of [literal-language] is normalized to lowercase so that
      comparison is simple string equality.


At least that is a start to be improved on


In terms of N-Triples

   "abc" equals "abc"

   "abc" does not equal "abc"

   "abc"-fr equals "abc"-FR

   "abc"-fr does not equal "abc"-en

   xml"<em>abc</em>" equals "<em>abc</em>"

   xml"<em>abc</em>" does not equal "<em>abcd</em>"

   xml"<em>abc</em>"-en equals "<em>abc</em>"-en

   xml"<em>abc</em>"-en equals "<em>abcd</em>"-EN

   xml"<em>abc</em>"-en does not equals "<em>abcd</em>"-fr


I did wonder about a canonical form for Pat to use, but I don't think
anyone supported it:

   literal("abc", "fr", true)

which is a sequence of
   [literal-value]
   (optional, default empty) [literal-language]
   (optional, default false) [literal-is-XML]

but as Pat said, the MT can consider literals opaque once the
equality rules are clear.

Dave

PS The N-Triples literal stuff is designed that if you do a dumb
US-ASCII compare of the characters, you get equality (as long
as the language tags are lowercased)
Received on Thursday, 25 April 2002 06:42:34 UTC