- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Thu, 25 Apr 2002 11:39:17 +0100
- To: Brian McBride <bwm@hplb.hpl.hp.com>
- cc: Pat Hayes <phayes@ai.uwf.edu>, w3c-rdfcore-wg@w3.org
>>>Brian McBride said: > At 09:26 24/04/2002 -0500, Pat Hayes wrote: > > [...] > > > >"An RDF literal has three parts (a bit, a character string, and a language > >tag [@@reference@@]), but we will treat them simply as character strings, > >since the other parts of the literal play no role in the model theory." > > > Where do we define literal equality? Nowhere at present; we additionally don't define what a literal is, according to our decisiions. The existing MT talks about string literals (since we postponed XML stuff till later) and uses string equality. The answer is captured in the issues list from various minutes. I'll try to find the bits we decided. Issue http://www.w3.org/2000/03/rdf-tracking/#rdfms-xmllang (and Issue http://www.w3.org/2000/03/rdf-tracking/#rdfms-xmllang) say: [[ a literal consists of three components: * A representation of the parseType, which is a single bit * A language indicator which is a string as defined in XML * A fully normalized UNICODE string. ]] -- decided in http://www.w3.org/2001/sw/RDFCore/20020225-f2f/#d-2002-02-26-1 (Aside: although I see no response from the issue raiser, timbl) Issue http://www.w3.org/2000/03/rdf-tracking/#rdfms-xml-literal-namespaces [[ * the exact form of the string value corresponding to any given XML Literal within RDF/XML is implementation dependent. * the string value is well-balanced XML * taking the exclusive canonicalization of both the original XML Literal in its containing document, and the string value of the literal produce the same character string. (this will be used as the basis for test cases) * the canonicalization above is without comments i.e. CONFORMANCE should be tested by canonicalizing without comments; comments may be included in the string representation of a literal ]] -- decided http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0235.html http://www.w3.org/2001/sw/RDFCore/20020225-f2f/ "Notes from I18N/RDFCore Meeting" * 18N agree that RDFCore requires a transitive string comparison algorithm and requests that the specs do not mislead application developers into thinking they are not permitted to implement a more flexible string matching algorithm, e.g. on queries. * I18N found the proposed solution of literals being a pair of a string and a language tag acceptable. Some discussion of the literal equality stuff proposed by John Cowan: [[ Literals are equal iff: 1) the strings are equal, and 2a) at least one string does not have a tag, or 2b) one tag is a prefix of the other within the meaning of RFC 3066 (i.e. "fr"/French is not a prefix of "fry"/Frisian but is a prefix of "FR-CA"/Canadian French). ]] -- http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Feb/0653.html but we backed off from the prefix stuff, allowing applications to be more permissive on matching language tags while letting RDF just use simple (case-independent) equality. ====================================================================== Trying to summarise An RDF literal consists of three components: * [literal-value] A fully normalized Unicode string. * [literal-is-XML] A representation of the parseType, which is a single bit. If the bit is set, the [literal-value] must be interpreted as serialized XML rather a sequence of Unicode characters. * [literal-language] A language indicator string Constraints On [literal-language] * Any allowed xml:lang content as defined in http://www.w3.org/TR/REC-xml#sec-lang-tag For XML literals; that is, when [literal-is-XML] is set. * the exact form of [literal-value] corresponding to any given XML Literal within RDF/XML is implementation dependent. * [literal-value] is well-balanced XML * Taking the exclusive canonicalization of both the original XML Literal in its containing document, and [literal-value] produce the same character string. * The canonicalization above is without comments i.e. CONFORMANCE should be tested by canonicalizing without comments; comments may be included in [literal-value] Equality If [literal-is-XML] is set Two RDF literals are equal if and only if Taking the exclusive canonicalization of both [literal-value]s produce exactly the same sequence of Unicode characters. AND If either literal has an [literal-language] they must be present in both and identical strings (case independent comparison) otherwise Two RDF literals are equal if and only if Both [literal-value]s are identical AND If either literal has an [literal-language] they must be present in both and identical strings (case independent comparison) Implementors Note: It is recommended but not required that the case of [literal-language] is normalized to lowercase so that comparison is simple string equality. At least that is a start to be improved on In terms of N-Triples "abc" equals "abc" "abc" does not equal "abc" "abc"-fr equals "abc"-FR "abc"-fr does not equal "abc"-en xml"<em>abc</em>" equals "<em>abc</em>" xml"<em>abc</em>" does not equal "<em>abcd</em>" xml"<em>abc</em>"-en equals "<em>abc</em>"-en xml"<em>abc</em>"-en equals "<em>abcd</em>"-EN xml"<em>abc</em>"-en does not equals "<em>abcd</em>"-fr I did wonder about a canonical form for Pat to use, but I don't think anyone supported it: literal("abc", "fr", true) which is a sequence of [literal-value] (optional, default empty) [literal-language] (optional, default false) [literal-is-XML] but as Pat said, the MT can consider literals opaque once the equality rules are clear. Dave PS The N-Triples literal stuff is designed that if you do a dumb US-ASCII compare of the characters, you get equality (as long as the language tags are lowercased)
Received on Thursday, 25 April 2002 06:42:34 UTC