2001-09-07#5 - literal problem

Here is a heads-up on where Bill and I have got to.
ACTION: 2001-09-07#5: Jeremy Caroll
   Collaborate with Bill dehOra to produce analysis of the literal
   problem, options, pros/cons for WG consideration.

We are expecting to continue this action into next week.



The issues we are addressing are:

1: The Representation of Literals
      e.g. a pair of a Unicode String and an RFC 3066 language tag.

2: Equality
      a complete equality rule for literals, correcting para 217-219 and
para 220, which currently leave parts of equality as undefined.

3: Preferred Mappings under rdf:parseType="Literal"
  (of xml document fragments into the representation of [1]).

  - Currently para 203 and para 220 are explicitly vague about this mapping.
  - We wish to specify one or more preferred mappings.
  - We may wish to specify more than one conformance level
      + motivation
         there are two significantly different use cases
            + quoting simply xhtml
            + quoting arbitrary XML fragments

4: Cases in which applications may expect problems.
  - Older RDF parsers will not be using the preferred mappings,
  we should indicate to document writers which XML fragments are likely to
be problematic.


A less ambitious approach would be to not specify a preferred mapping and
simply specify a minimal requirement about some markup that should work
(e.g. embedded xhtml without namespaces and without references). This would
be consistent with M&S and put off the work M&S defers until RDF 2.0.


Questions for WG for tomorrow:
+ Does the WG agree that a Literal is a <Unicode String,RFC 3066> pair?
+ Does the WG agree that Literal equality should be defined?
+ Does the WG agree that the new specs should descibe a specific Unicode
string to be delivered by rdf:parseType="Literal"?

Our current working text for the literal representation is:

==========

An RDF Literal is* a Unicode string, optionally** paired with a
language tag (as defined in RFC3066).

When comparing two RDF Literals, their Unicode strings must be
equal for the RDF Literals to compare as equal. If both Literals
have language tags, these tags must be equal for the Literals to
be considered equal. If two Literals are found equal but only
one has a language tag, the Literals should not*** be considered
equal.

The equality of Unicode strings is specified by W3C I18N WG;
see [fixme:url]. Language tag equality is defined by RFC3066
and is case insensitive.

===========
* is represented as, or is a? Do we pass by reference or value :)

** equivalently we could delete 'optionally' and allow the language tag to
be null, or default to "und" the ISO-639-2 undetermined language. Note, the
following from RFC 3066, suggests 'Omitting'.

   5. You SHOULD NOT use the UND (Undetermined) code unless the protocol
      in use forces you to give a value for the language tag, even if
      the language is unknown.  Omitting the tag is preferred.

***the purpose of 'should not' is to allow applications some flexibility
on dealing with language tags. That is, when a literal is equal to
another but only one has a lag tag, they can be considered equivalent,
which might be sufficient for some applications to make a match.


The truth table corresponding to that notion of equality is:

Truth table for equality (s1,t1) == string1, tag1; f* means should not
be true; assume s1!=s t1!=t according to the specs in question.

        (s,_)  (s,t)  (s1,_)  (s1,t1)
--------------------------------------
(s,_)     t      f*      f       f
(s,t)     f*     t       f       f
(s1,_)    f      f       t       f*
(s1,t1)   f      f       f*      t   (s,t1)  (s1,t)
                                     ---------------
(s,t1)    f*     f       f       f      t      f
(s1,t)    f      f       f*      f      f      t





I am about to summarise some discussion between Bill and I on the
rdf:parseType="Literal" question.



Feedback welcome


Jeremy

Received on Thursday, 13 September 2001 09:40:50 UTC