An RDF literal is one of:
Two RDF literals are equal if and only if one of the following:
A string literal label in an RDF graph is composed of a Unicode string [UNICODE] that is in Normal Form C [NFC], and a language identifier (possibly empty) as specified below.
Two string literals are equal if both components are equal. The Unicode string components are compared on a character by character basis. The language tag components are compared in a case insensitive fashion.
Allowable language identifiers are the legal values for
xml:lang as specified by section
2.12, Language
Identification, in [XML],
or the empty string ""
.
Equality of language identifiers (as specified in
[RFC-3066]) is defined by case
insensitive character by character comparison.
Note: This direct comparison between language identifiers is appropriate for the purpose of defining equality between RDF graphs, but is linguistically naive. [RFC-3066] suggests more advanced comparison techniques.
Note: The empty language tag is used for literals for which no language information is available.
Note: Literals beginning with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].
See the following test cases, per [RDF-TESTS]:
[[[Subject to WG disposition of test cases]]]
Within an RDF graph, an XML literal is a Unicode [UNICODE] string paired with a language identifier. The string is well-balanced, self-contained XML element content [XML].
An XML literal, with non-empty language identifier, can be used to form an XML document by concatenating the five strings:
The resulting Unicode string is then encoded in UTF-8.
When the language identifier is the empty string, the corresponding XML document is formed by enclosing the Unicode string of the XML literal with "<tag>" and "</tag>" and encoding the resulting string in UTF-8.
No escaping is applied in either process. The choice of tag is arbitrary.
This resulting XML document corresponding to the XML literal is a well-formed XML document [XML] that also conforms to XML Namespaces [XML-NS].
Note: If compatibility with XML version 1.1 is desired, then XML literals in RDF graphs must be restricted to those that are fully normalized according to [XML 1.1].
The exclusive canonicalization of an XML literal is formed by:
If two XML literals are equal then:
This specification, above, gives necessary conditions for the equality of XML literals. The RDF Test Cases [RDF-TESTS] treat these necessary conditions as also sufficient.
Implementations are free to add additional sufficient conditions for equality. If two XML literals compare equal according to an implementation then they must compare equal according to this definition, but not conversely. In particular, XML comments may be treated as significant, and namespaces that are in scope but not visibly utilized (as defined by [XC14N]) may be treated as significant.
Note to Graham, I deleted a "per RFC3066" which I think you wrote, because it introduced a normative dependency on RFC3066. I replaced it with "(case insensitve comparison)"
[[[Is there a need for a longer non-normative appendix on implemenation issues for XML literals? This could discuss (a) minimal implementations, for which equality is not needed, and where the set of namespaces and namespace prefixes can be fixed in advance (b) the correct and incorrect use of character by character equality for XML literals. Should there be test cases for issue rdfms-xml-literal-namespaces? ]]]
See the following test cases, per [RDF-TESTS]:
[[[Subject to WG disposition of test cases]]]
Within an RDF graph, a typed literal is a triple:
Note to WG: as we decided on Sept 13 we have lexical values here. My understanding was that the WG wanted, at the abstract syntax level, *no* expectation that an RDF processor could do any datatype specific processing. Hence I am not including conditions like:
I include a tentative reference to the model theory on the hope that Pat is braver than I am. Otherwise, I end up not discussing datatyping in this section.
An alternative way to go, which I think might be preferable, may be for me to define a function from a typed literal to its value.
Pat could then simply invoke this function to get to the denotation of the literal.
Such a function need not impact on equality, which is defined here purely lexically.
I understood that WG consensus would form more easily around including a language tag in the typed literals - I remain a little unhappy with this.
According to the RDF Model Theory [RDF-SEMANTICS], a typed literal denotes a value from the value space of the datatyped named by the datatype URI.
In contrast, the abstract syntax, presupposes no datatype specific processing.
Two typed literals are equal if and only if all of the following hold:
Note: If compatibility with XML version 1.1 is desired, then lexical forms must be restricted to those that are fully normalized according to [XML 1.1].
See the following test cases, per [RDF-TESTS]:
[[[Subject to WG disposition of test cases]]]