[1]
An RDF Literal is a Unicode string, optionally paired with a
language tag (as defined in RFC3066).


[2]
Future versions of RDF may migrate to a more general mechanism for
literal representation in which the current representation would be
embedded. One candidate is that an RDF literal would be a pair 
of a unicode string and a URI reference. The current literals would 
be embedded within this new representation using a well-known URI 
as a base for all language tag URIs. 

[3]
   NOTE: The RDF Core Working Group has yet to consider whether
   such an approach would be useful for integrating XML schema
   datatyping with RDF.

[4]
When comparing two RDF Literals, their Unicode strings MUST be
equal for the RDF Literals to compare as equal. If both Literals
have language tags, these tags MUST be equal for the Literals to
be considered equal. If two Literals are found with equal Unicode
strings but only one has a language tag, the Literals SHOULD NOT 
be considered equal.

[5]
   NOTE: The purpose of 'SHOULD NOT' is to allow 
   applications some flexibility in dealing with 
   language tags. That is, when a literal is equal to
   another but only one has a language tag, they can be 
   considered equivalent, which might be sufficient 
   for some applications to make a match.

[6]
The truth table for equality is as follows.
Pairs (s,t) are the unicode string, and the RFC3066 tag.
'_' means no tag is given. 'f*' means 'SHOULD NOT'
be true. 'f' means 'MUST' be false. 't' means 'MUST' be true.
s1!=s and t1!=t according to the specifications in question.

[7]

        (s,_)  (s,t)  (s1,_)  (s1,t1)
--------------------------------------
(s,_)     t      f*      f       f
(s,t)     f*     t       f       f
(s1,_)    f      f       t       f*
(s1,t1)   f      f       f*      t   (s,t1)  (s1,t)
                                     ---------------
(s,t1)    f*     f       f       f      t      f
(s1,t)    f      f       f*      f      f      t


[8]
RDF makes a distinction between equality and equivalence for
Literals. RDF Literals are equal in accordance with [2]. 
Equivalence represents a broader notion of how Literals might 
be considered to match each other or be interchangeable in some 
way. Applications MAY determine that RDF Literals 
are equivalent while not being equal. For example the literals
'+353(0)12968607' and '0035312968607' are not equal, but they 
might reasonably be interpreted as equivalent in some context. 
Inferring such equivalences typically requires extra metadata 
or assumptions about the literals in question (such as might be 
available about the predicate whom the literal is the value of, or 
being aware that the literal is of a well known type). RDF 
processors SHOULD deal with equality and normalisation
of Literals only, and SHOULD NOT be expected to make or find such
equivalences. Future work in RDF may define ways in which extra
information about RDF Literals can be modelled in the light of
implementation experience.

   ASIDE to rdfcore-wg: Does the example work for US & Asian 
   readers?
   I note that "some context" is approx. on a European telephone.
   (I'm guessing that + does not expand as 00 in Asia!)
  

[9]
RDF assumes that the Unicode strings in Literals are normalized
according to Unicode Normalization Form C [NFC, NFC-Corrigendum]. 
An early uniform normalization framework is used.

[10]
When parsing RDF/XML the XML processor, if necessary, converts
the XML document to the UCS character domain. When doing this
from any encoding that is not UCS-based this conversion SHOULD
use Unicode Normalization Form C [NFC, NFC-Corrigendum].

[11]
RDF/XML processors MUST NOT normalize input an XML document that
is encoded in a UCS-based encoding. c.f. [CHARMOD] for rationale.

[12]
RDF/XML documents SHOULD be W3C-normalized as specified in
[CHARMOD]. Moreover, after the stripping of comments and
processing instructions an RDF/XML document SHOULD still be
W3C-normalized. It is the responsibility of the document
creator to fulfil this requirement. RDF/XML processors MUST NOT
correct input that is not W3C-normalized.

[13]
RDF/XML processors MAY detect lack of W3C-normalization in
an input document, and issue a diagnostic.

[14]
Summary of text normalization for RDF/XML processors.
RDF/XML processors MUST use a normalizing transcoder
from non-UCS-based encodings.
RDF/XML processors MUST NOT do any other text normalization.

[15]
Unicode string equality within Literals is given by binary 
equality.
(cf. http://www.w3.org/TR/charmod/#sec-IdentityMatching )


[16]
Language tag equality is defined by RFC3066 and is case 
insensitive.


[17]
RDF Literals arising from general attribute values (using the 
production 
      [6.10] propAttr  ::=  propName '="' string '"', 
i.e.  Production 4.13  propAttr in [Refactoring]   
):

[18]
+ have their language component defined by the value of xml:lang 
(if any) in the enclosing element, as specified in [XML].

[19]
+ the Unicode string is given by the attribute value after XML
  attribute value normalization.

[20]
    NOTE: document authors are warned that XML attribute
    value normalization differs slightly if an attribute is
    declared in a DTD with an attribute type other than CDATA.
    In such cases, validating and non-validating XML parsers
    will normalize some values differently, and hence RDF/XML
    processors will produce different RDF Literals.

[21]
RDF Literals arising from the propertyElt production with
a non-empty string value as element content, and no rdf:parseType 
attribute (using the first production of 6.12):

[22]
+ have their language component defined by the value of xml:lang 
(if any) in the property element, as specified in [XML].

[23]
+ the RDF Literal Unicode string is formed as the concatenation 
  of the element content subject to the usual XML processing
  rules:

[24]
     - character references are expanded.

[25]
     - entity references are illegal, it MAY be possible
       that a different RDF production is matched in this case.

[26]
     - CDATA sections are expanded.

[27]
      - XML comments are discarded.

[28]
      - processing instructions are discarded.

[29]
  NOTE: When converting the document from any encoding 
  that is not UCS-based, Unicode Normalization Form C
  is produced before concatenation of the various parts 
  of the literal. Hence, it is possible (if unwise) to 
  use various XML escaping mechanism to produce non-normalized 
  RDF Literals. Such a document is not W3C-normalized.

[30]
RDF Literals arising from the propertyElt production with
an empty string value as element content, and no rdf:parseType 
attribute (using the first production of 6.12):

[31]
+ have their language component defined by the value of xml:lang 
(if any) in the property element, as specified in [XML].

[32]
+ the RDF Literal Unicode string is the string of length 0.

[33]
RDF Literals arising from the propertyElt production with 
rdf:parseType="Literal" attribute (using the [n]th production 
of 6.12):

[34]
+ have their language component defined by the value of xml:lang 
(if any) in the property element, as specified in [XML].

[35]
+ MAY have their Unicode string component as
  the string of the element content as given
  in the input document, after converting the character encoding.

[36]
+ MAY have their Unicode string component as given by the 
  Unicode string of the XML Canonicalization of the document 
  subset consisting of the element content. See XML 
  Canonicalization section 2.4. 
  http://www.w3.org/TR/2001/REC-xml-c14n-20010315#DocSubsets
  The XML Canonicalization specifies a UTF-8 string, the
  RDF Literal is the encoded Unicode string. 
  Such a canonicalization MAY or MAY NOT include comments.

[37]
+ MAY have their Unicode string component as the as
  the string of the element content as given
  in the input document,  after converting the character encoding,
  and then processed in any of the following ways, in any order:

[38]
  - all comments MAY be stripped.

[39]
  - all processing instructions MAY be stripped.

[40]
  - all character references MAY be expanded.

[41]
  - all entity references MAY be expanded.

[42]
  - all CDATA sections MAY be replaced by their content.

[43]
  - all attribute values can be normalized as in XML 
    canonicalization viz, replacing :-
    . all ampersands (&) with &amp;
    . all open angle brackets (<) with &lt;
    . all quotation mark characters with &quot;
    . all whitespace characters #x9, #xA, and #xD, with character 
      references.

[44]
  - all text nodes can be normalized as in XML 
    canonicalization viz., replacing :-
    . all ampersands are replaced by &amp;
    . all open angle brackets (<) are replaced by &lt;
    . all closing angle brackets (>) are replaced by &gt;
    . all #xD characters are replaced by &#xD;.

[45]
  - any additional XML namespace declarations which is 
    in scope in the surrounding property element MAY 
    be added to any start element tag in the XML literal.

[46]
  - any XML namespace declaration MAY be deleted from
    any start element tag.

[47]
  - any change that does not change the document information
    set MAY be made. A non-exhaustive list can be found
    in Infoset Appendix D.

[48]
   NOTE: The meaning of 'all' in the above paragraphs is that
   an RDF processing environment that makes such a change
   in one instance in one literal MUST make the corresponding
   change in every instance in every literal.

[49]
   NOTE: namespace prefix rewriting is prohibited. 
   Paragraphs [45] and [46] permit changing the namespace
   declaration but not the changing of the prefixes in 
   the QNames. This is to allow rdf:parseType="Literal"
   values to include namespace prefixes in attribute values 
   and elsewhere.

[50]
For maximum interoperability RDF processors are RECOMMENDED
to use XML canonicalization without comments as the string
in the RDF Literal formed by the rdf:parseType="Literal"
property element production.

[51]
   NOTE: The Working Group will review this recommendation
   in light of implementation experience.