Resource Description Framework (RDF): Concepts and Abstract Data Model

3.2 RDF Literals

An RDF literal is one of:

a string literal;
an XML literal;
a typed literal;

Two RDF literals are equal if and only if one of the following:

They are both string literals and equal.
They are both XML literals and equal.
They are both typed literals and equal.

3.2.1 String Literals

A string literal label in an RDF graph is composed of a Unicode string [UNICODE] that is in Normal Form C [NFC], and a language identifier (possibly empty) as specified below.

Two string literals are equal if both components are equal. The Unicode string components are compared on a character by character basis. The language tag components are compared in a case insensitive fashion.

Allowable language identifiers are the legal values for xml:lang as specified by section 2.12, Language Identification, in [XML], or the empty string "". Equality of language identifiers (as specified in [RFC-3066]) is defined by case insensitive character by character comparison.

Note: This direct comparison between language identifiers is appropriate for the purpose of defining equality between RDF graphs, but is linguistically naive. [RFC-3066] suggests more advanced comparison techniques.

Note: The empty language tag is used for literals for which no language information is available.

Note: Literals beginning with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].

See the following test cases, per [RDF-TESTS]:

[[[Subject to WG disposition of test cases]]]

3.2.2 XML Literals

Within an RDF graph, an XML literal is a Unicode [UNICODE] string paired with a language identifier. The string is well-balanced, self-contained XML element content [XML].

An XML literal, with non-empty language identifier, can be used to form an XML document by concatenating the five strings:

"<tag xml:lang='"
the language identifier of the XML literal
"'>"
the Unicode string of the XML literal
"</tag>"

The resulting Unicode string is then encoded in UTF-8.

When the language identifier is the empty string, the corresponding XML document is formed by enclosing the Unicode string of the XML literal with "<tag>" and "</tag>" and encoding the resulting string in UTF-8.

No escaping is applied in either process. The choice of tag is arbitrary.

This resulting XML document corresponding to the XML literal is a well-formed XML document [XML] that also conforms to XML Namespaces [XML-NS].

Note: If compatibility with XML version 1.1 is desired, then XML literals in RDF graphs must be restricted to those that are fully normalized according to [XML 1.1].

The exclusive canonicalization of an XML literal is formed by:

Forming the XML document corresponding to the XML literal as above.
Taking the exclusive canonicalization without comments [XC14N] of the element content of the root element of the document.

If two XML literals are equal then:

The language identifiers are equal as language identifiers (case insensitive comparison).
The exclusive canonicalizations of the XML literal are equal UTF-8 strings, octet by octet.

This specification, above, gives necessary conditions for the equality of XML literals. The RDF Test Cases [RDF-TESTS] treat these necessary conditions as also sufficient.

Implementations are free to add additional sufficient conditions for equality. If two XML literals compare equal according to an implementation then they must compare equal according to this definition, but not conversely. In particular, XML comments may be treated as significant, and namespaces that are in scope but not visibly utilized (as defined by [XC14N]) may be treated as significant.

Note to Graham, I deleted a "per RFC3066" which I think you wrote, because it introduced a normative dependency on RFC3066. I replaced it with "(case insensitve comparison)"

[[[Is there a need for a longer non-normative appendix on implemenation issues for XML literals? This could discuss (a) minimal implementations, for which equality is not needed, and where the set of namespaces and namespace prefixes can be fixed in advance (b) the correct and incorrect use of character by character equality for XML literals. Should there be test cases for issue rdfms-xml-literal-namespaces? ]]]

See the following test cases, per [RDF-TESTS]:

[[[Subject to WG disposition of test cases]]]

3.2.3 Typed Literals

Within an RDF graph, a typed literal is a pair:

An RDF URI reference (the datatype URI).
A Unicode [UNICODE] string (the lexical form).
A langauge identifier

The datatype URI refers to an XML Schema Datatype, either a built-in type or a user-derived type. [[How do we get from the URI to the qname?]].

The lexical form must be a string in the lexical space of the datatype associated with the datatype URI. Moreover, if the datatype has an associated canonical lexical representation then the lexical form must be a member of that canonical representation.

The typed value associated with the typed literal is found by applying the datatype mapping associated with the datatype URI to the lexical form.

Two typed literals are equal if and only if both of the following hold:

The two datatype URIs compare equal, character by character.
The two lexical forms compare equal, character by character.

Note: If the two typed literals are equal then the corresponding typed values are also equal. The converse can be false e.g. <&xsd;int>"10" is not equal to <&xsd;integer>"10" as typed literals, while the corresponding typed values are equal.

Note: Document authors and system users should be aware that implementations may fail to canonicalize lexical forms on input if they lack knowledge of an XSD built-in type, or if they fail to retrieve a user-derived type; the fallback behaviour in such cases may differ from the correct behaviour.

Note: If compatibility with XML version 1.1 is desired, then lexical forms must be restricted to those that are fully normalized according to [XML 1.1].

See the following test cases, per [RDF-TESTS]:

[[[Subject to WG disposition of test cases]]]

test cases tbd