- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Thu, 25 Apr 2002 11:39:17 +0100
- To: Brian McBride <bwm@hplb.hpl.hp.com>
- cc: Pat Hayes <phayes@ai.uwf.edu>, w3c-rdfcore-wg@w3.org
>>>Brian McBride said:
> At 09:26 24/04/2002 -0500, Pat Hayes wrote:
>
> [...]
>
>
> >"An RDF literal has three parts (a bit, a character string, and a language
> >tag [@@reference@@]), but we will treat them simply as character strings,
> >since the other parts of the literal play no role in the model theory."
>
>
> Where do we define literal equality?
Nowhere at present; we additionally don't define what a literal is,
according to our decisiions. The existing MT talks about string
literals (since we postponed XML stuff till later) and uses string
equality. The answer is captured in the issues list from various minutes.
I'll try to find the bits we decided.
Issue http://www.w3.org/2000/03/rdf-tracking/#rdfms-xmllang
(and Issue http://www.w3.org/2000/03/rdf-tracking/#rdfms-xmllang) say:
[[
a literal consists of three components:
* A representation of the parseType, which is a single bit
* A language indicator which is a string as defined in XML
* A fully normalized UNICODE string.
]]
-- decided in http://www.w3.org/2001/sw/RDFCore/20020225-f2f/#d-2002-02-26-1
(Aside: although I see no response from the issue raiser, timbl)
Issue http://www.w3.org/2000/03/rdf-tracking/#rdfms-xml-literal-namespaces
[[
* the exact form of the string value corresponding to any given
XML Literal within RDF/XML is implementation dependent.
* the string value is well-balanced XML
* taking the exclusive canonicalization of both the original XML
Literal in its containing document, and the string value of the
literal produce the same character string. (this will be used
as the basis for test cases)
* the canonicalization above is without comments i.e. CONFORMANCE
should be tested by canonicalizing without comments; comments
may be included in the string representation of a literal
]]
-- decided http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Mar/0235.html
http://www.w3.org/2001/sw/RDFCore/20020225-f2f/
"Notes from I18N/RDFCore Meeting"
* 18N agree that RDFCore requires a transitive string comparison
algorithm and requests that the specs do not mislead application
developers into thinking they are not permitted to implement a more
flexible string matching algorithm, e.g. on queries.
* I18N found the proposed solution of literals being a pair of a
string and a language tag acceptable.
Some discussion of the literal equality stuff proposed by John Cowan:
[[
Literals are equal iff:
1) the strings are equal, and
2a) at least one string does not have a tag, or
2b) one tag is a prefix of the other within the meaning of RFC 3066
(i.e. "fr"/French is not a prefix of "fry"/Frisian but is a prefix
of "FR-CA"/Canadian French).
]]
-- http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Feb/0653.html
but we backed off from the prefix stuff, allowing applications to be
more permissive on matching language tags while letting RDF just use
simple (case-independent) equality.
======================================================================
Trying to summarise
An RDF literal consists of three components:
* [literal-value]
A fully normalized Unicode string.
* [literal-is-XML]
A representation of the parseType, which is a single bit.
If the bit is set, the [literal-value] must be interpreted
as serialized XML rather a sequence of Unicode characters.
* [literal-language]
A language indicator string
Constraints
On [literal-language]
* Any allowed xml:lang content as defined in
http://www.w3.org/TR/REC-xml#sec-lang-tag
For XML literals; that is, when [literal-is-XML] is set.
* the exact form of [literal-value] corresponding to any given
XML Literal within RDF/XML is implementation dependent.
* [literal-value] is well-balanced XML
* Taking the exclusive canonicalization of both the original XML
Literal in its containing document, and [literal-value]
produce the same character string.
* The canonicalization above is without comments i.e. CONFORMANCE
should be tested by canonicalizing without comments; comments
may be included in [literal-value]
Equality
If [literal-is-XML] is set
Two RDF literals are equal if and only if
Taking the exclusive canonicalization of both [literal-value]s
produce exactly the same sequence of Unicode characters.
AND
If either literal has an [literal-language] they must be present
in both and identical strings (case independent comparison)
otherwise
Two RDF literals are equal if and only if
Both [literal-value]s are identical
AND
If either literal has an [literal-language] they must be present
in both and identical strings (case independent comparison)
Implementors Note: It is recommended but not required that the
case of [literal-language] is normalized to lowercase so that
comparison is simple string equality.
At least that is a start to be improved on
In terms of N-Triples
"abc" equals "abc"
"abc" does not equal "abc"
"abc"-fr equals "abc"-FR
"abc"-fr does not equal "abc"-en
xml"<em>abc</em>" equals "<em>abc</em>"
xml"<em>abc</em>" does not equal "<em>abcd</em>"
xml"<em>abc</em>"-en equals "<em>abc</em>"-en
xml"<em>abc</em>"-en equals "<em>abcd</em>"-EN
xml"<em>abc</em>"-en does not equals "<em>abcd</em>"-fr
I did wonder about a canonical form for Pat to use, but I don't think
anyone supported it:
literal("abc", "fr", true)
which is a sequence of
[literal-value]
(optional, default empty) [literal-language]
(optional, default false) [literal-is-XML]
but as Pat said, the MT can consider literals opaque once the
equality rules are clear.
Dave
PS The N-Triples literal stuff is designed that if you do a dumb
US-ASCII compare of the characters, you get equality (as long
as the language tags are lowercased)
Received on Thursday, 25 April 2002 06:42:34 UTC