Last call comments from the I18N WG on RDF WDs from Martin Duerst on 2003-11-07 (www-rdf-comments@w3.org from October to December 2003)

From: Martin Duerst <duerst@w3.org>
Date: Fri, 07 Nov 2003 04:06:16 -0500
To: www-rdf-comments@w3.org
Cc: w3c-i18n-ig@w3.org
Message-Id: <4.2.0.58.J.20031107032037.0596e790@localhost>
Dear RDF WG,

Here are the last call comments from the I18N WG on
the RDF drafts. This is not necessarily by draft, but
by feature.

- Treatment of language information for XML Literals:
   We have already commented on this extensively. We think that the
   removal of language information from XML literals is a serious
   problem for internationalization. Details of our comments can
   be found at http://www.w3.org/2003/09/ri434.html.

   One example of how language information could easily be added to
   XML literals, with minimal impact on the overall design,
   (just a proposal, not intended to preclude any other solution)
   is to use a very short wrapper element:
   The lexical form of the XML literal in
      <ex:prop rdf:parseType='Literal'>foo</ex:prop>
   (minimal example on purpose) becomes
      <w>foo</w>
   i.e. a <w> element is added as a wrapper to all xml literals.
   <w> stands for 'wrapper'. A single-character element name is
   chosen to keep potential overhead as low as possible. For
      <ex:prop rdf:parseType='Literal' xml:lang='fr'>foo</ex:prop>
   the lexical form becomes
      <w xml:lang="fr">foo</w>
   <w> does not need a namespace because it is only used for
   abstraction/descriptive purposes within RDF, and potentially
   within implementations that chose to use this form of representation.
   It will usually not be used for interchange. It may occasionally be
   used for interchange if instead of rdf:parseType='Literal', rdf:dataType
   is used, i.e. in
      <ex:prop rdf:dataType='&rdf;XMLLiteral'>&lt;w&gt;foo&lt;/w&gt;</ex:prop>
   but in that case, it is also well identified.


- XML Literals as typed literals: We have not objected to the treatment
   of XML Literals as typed literals in the previous last call, because
   there, it was possible to understand this treatment just as a technical
   convenience. However, with the change to the treatment of language
   information for XML literals, treating XML literals as datatyped literals
   becomes highly questionable. Unless the language information issue can
   be solved, we have to disagree with this treatment.

- XML Literals containing only text should be equivalent to the
   corresponding plain literals and to the corresponding string type
   literals. (Solving the language information issue for XML Literals
   seems to be a precondition for this, but once this is done, it does
   not seem to be too difficult, in the same way that it was possible
   to make strings and plain literals equivalent.)

- Examples/Primer: There is one important facility of RDF that is almost
   completely ignored in the primer and in the general discussion in the
   concepts document. This is the ability to use not only ASCII characters,
   but Unicode, in literal values, URIrefs, and (with some restrictions)
   XML element names, and therefore property names and names of other nodes.
   While this may be of somewhat secondary importance to English readers
   (but still more important than the treatment it is given), it is crucial
   for translations of the specification. We suggest the following:
   - Mention this possibility very early on, at the first point where
     literals and URIrefs are first treated in detail (most probably
     section 2.2 (or even 2.1)).
   - Add a simple example at this point, or change an already existing
     example slightly.
   - For the extensive examples in section 6, replace some of the
     current examples with equivalent examples with more international
     flavor. Most of the applications in section 6 are used in a
     world-wide context, and finding some examples should not be
     difficult. Overall, changing or adding two to three examples
     in section 6 should be sufficient. They should not be limited to
     examples like example 32, which contains the copyright sign
     as a single non-ASCII character, although having an example
     that shows how non-ASCII characters can be convenient in a
     purely English context may also be a good idea.
   - The explanations to these examples should mention the fact
     that RDF and XML allow Unicode characters. This does not
     have to be extensive; a few short sentences, with pointers
     to the relevant parts of the normative specs, should be sufficent.
     Readers of the primer should not be bothered/confused with
     issues such as normalization.

- Alt container: Because of the special rule that the first element is
   the default or preferred value, this is a fake alternative. This should
   be changed, or a real alternative, without any preferences, should be
   provided. This is in particular important if there is no preferred
   version among different language versions (which is often needed for
   political reasons), but we are sure there are many other cases where
   it is not desirable to have a preferred alternative, or where there
   just simply is no preferred alternative. (Other such examples include
   voting ballots. Even for the ftp example given, a true alternative
   may be desirable, to allow load balancing.)

- Measures/weights: The primer in a very small number of instances
   uses 'weightInKg', and explains why, but for the rest, it always
   uses just 'weight', even when there is no reason for such an
   underspecified property. For world-wide data interchangability,
   such details are crucial. Unless there is a specific point to make
   (e.g. when explaining rdf:value), 'weightInKg' should always be
   preferred. The same applies to other properties such as
   rearSeatLegRoom. Language such as (primer 4.4:)
   >>>>
   because frequently the value
   would be recorded simply as the typed literal (as in the triple above),
   relying on an understanding of the context to fill in the unstated
   units information.
   >>>>
   should be avoided. The primer should not recommend
   practices that have made Mars missions go astray, among else.

- The motivation for using xsd datatypes should clearly say that these
   are well established and should be used where appropriate. Having
   the possibility to use different types is good if needed, but in
   general, interoperability is increased when using a well-defined
   common set. Also, to a large extent, the value spaces, and to
   a somewhat smaller extent, the lexical forms, are locale-independent,
   which make it possible to exchange data independent of a particular
   human-oriented representation. This should also be mentioned.


Regards,    Martin.
Received on Friday, 7 November 2003 07:54:27 UTC