Re: Last call comments from the I18N WG on RDF WDs from Frank Manola on 2003-11-14 (www-rdf-comments@w3.org from October to December 2003)

From: Frank Manola <fmanola@acm.org>
Date: Fri, 14 Nov 2003 11:24:19 -0500
To: Martin Duerst <duerst@w3.org>
Cc: www-rdf-comments@w3.org, w3c-i18n-ig@w3.org
Message-ID: <3FB501B3.1080700@acm.org>
Martin Duerst wrote:

> 
> Dear RDF WG,
> 
> Here are the last call comments from the I18N WG on
> the RDF drafts. This is not necessarily by draft, but
> by feature.
>

Martin--

Thanks for your comments on the RDF drafts.  I'm responding regarding 
your suggestions concerning the Primer (I'm responding in a separate 
message for each of your two messages).  In general, I'd like to address 
your points explicitly, but without making extensive modifications to 
existing text, or introducing numerous new or changed examples.  The 
Primer is already rather long, and most of the material you've commented 
on was in the original Last Call version.

Regarding your first message 
(http://lists.w3.org/Archives/Public/www-rdf-comments/2003OctDec/0120.html):

 >
 > - Examples/Primer: There is one important facility of RDF that is almost
 >   completely ignored in the primer and in the general discussion in the
 >   concepts document. This is the ability to use not only ASCII 
characters,
 >   but Unicode, in literal values, URIrefs, and (with some restrictions)
 >   XML element names, and therefore property names and names of other 
nodes.
 >   While this may be of somewhat secondary importance to English readers
 >   (but still more important than the treatment it is given), it is 
crucial
 >   for translations of the specification. We suggest the following:
 >   - Mention this possibility very early on, at the first point where
 >     literals and URIrefs are first treated in detail (most probably
 >     section 2.2 (or even 2.1)).
 >   - Add a simple example at this point, or change an already existing
 >     example slightly.
 >   - For the extensive examples in section 6, replace some of the
 >     current examples with equivalent examples with more international
 >     flavor. Most of the applications in section 6 are used in a
 >     world-wide context, and finding some examples should not be
 >     difficult. Overall, changing or adding two to three examples
 >     in section 6 should be sufficient. They should not be limited to
 >     examples like example 32, which contains the copyright sign
 >     as a single non-ASCII character, although having an example
 >     that shows how non-ASCII characters can be convenient in a
 >     purely English context may also be a good idea.
 >   - The explanations to these examples should mention the fact
 >     that RDF and XML allow Unicode characters. This does not
 >     have to be extensive; a few short sentences, with pointers
 >     to the relevant parts of the normative specs, should be sufficent.
 >     Readers of the primer should not be bothered/confused with
 >     issues such as normalization.

I think these points can be adequately addressed by following your first 
suggestion above.  What I'd propose is:

*  Note in the initial discussion of URIs in Section 2.1 (and 
correspondingly in Appendix A) that URIs can contain Unicode characters 
(citing [RDF-CONCEPTS]), so they are even more general as subjects, 
predicates, or objects in statements.

*  Similarly, note in the initial discussion of XML in Section 2.1 (and 
correspondingly in Appendix B) that XML content and (with some 
exceptions) tags can contain Unicode characters.

*  Note when discussing plain literals early in Section 2.2 that the 
character strings can contain Unicode characters (this is the place 
where character strings initially come up).

*  Note in Section 2.4 that the lexical spaces of typed literals are 
defined as Unicode strings (citing [RDF-CONCEPTS]), and hence 
non-English content can be represented.

*  Note in Section 3.1 (I think the discussion of typed literals is the 
best place to do this) that XML strings (both for use in plain and typed 
literals) can contain Unicode characters (citing [XML] and 
[RDF-SYNTAX]), and hence non-English content can be represented.

I'm *very* reluctant to make changes in examples (or do any additional 
research to find new ones) at this late date.  In particular, the 
examples in Section 6 are taken from the indicated sources, not made up 
for the Primer.  It may be unfortunate that they don't have a more 
international flavor, but the idea isn't necessarily to indicate all 
possible applications.  Changing or adding two to three examples in 
section 6 may not sound like much, but it's quite a chore at this stage. 
  Also, the main point of Section 6 is to illustrate a range of 
examples, not so much to illustrate the use of all of the RDF facilities 
introduced in earlier sections.

 >
 > - Alt container: Because of the special rule that the first element is
 >   the default or preferred value, this is a fake alternative. This should
 >   be changed, or a real alternative, without any preferences, should be
 >   provided. This is in particular important if there is no preferred
 >   version among different language versions (which is often needed for
 >   political reasons), but we are sure there are many other cases where
 >   it is not desirable to have a preferred alternative, or where there
 >   just simply is no preferred alternative. (Other such examples include
 >   voting ballots. Even for the ftp example given, a true alternative
 >   may be desirable, to allow load balancing.)

I believe this issue is being responded to separately, but I don't think 
this is a "fake alternative".  There's no rule that says a given app 
can't ignore the first alternative and choose one of the others for its 
own reasons (and, as the Primer tries hard to explain, RDF doesn't 
itself enforce either the "preferred" or "alternative" semantics 
anyway).  After all, the statement of the alternatives may reflect the 
opinion of whoever created the original information as to what the 
preferred alternative is, but it's certainly possible for someone else 
to have a different opinion, and select a corresponding alternative.

 >
 > - Measures/weights: The primer in a very small number of instances
 >   uses 'weightInKg', and explains why, but for the rest, it always
 >   uses just 'weight', even when there is no reason for such an
 >   underspecified property. For world-wide data interchangability,
 >   such details are crucial. Unless there is a specific point to make
 >   (e.g. when explaining rdf:value), 'weightInKg' should always be
 >   preferred. The same applies to other properties such as
 >   rearSeatLegRoom. Language such as (primer 4.4:)
 >   >>>>
 >   because frequently the value
 >   would be recorded simply as the typed literal (as in the triple above),
 >   relying on an understanding of the context to fill in the unstated
 >   units information.
 >   >>>>
 >   should be avoided. The primer should not recommend
 >   practices that have made Mars missions go astray, among else.

Changing all instances of simple properties like "weight" and 
"rearSeatLegRoom" to include units in their names would require changes 
all over the Primer, and this, it seems to me, would be a rather 
indirect way of making the point you're really raising.  What I suggest 
is to add an explicit comment at the end of Section 4.4 that, while you 
don't need to use rdf:value as described here, this illustrates the 
issue that global interoperability requires that sufficient metadata 
(such as units information) be explicitly recorded to eliminate 
ambiguity.  This might be done using rdf:value, or using additional 
properties (or possibly using different datatypes).

One of the reasons for using examples like "weight" is familiarity (the 
tent example is based on actual Web pages).  Another is simplicity 
which, it seems to me, ought to be a main consideration in a Primer. The 
Primer, in giving examples (either real or made-up ones), is not 
"recommending" anything.  Moreover, the language you object to is simply 
stating a fact:  frequently such literals are recorded in exactly the 
way I describe.  This may be (I agree) an unfortunate practice, but it 
is used all over the world (in both databases and web pages).  Also, I 
think the language you cite, far from being needing to be avoided, is 
helpful in explicitly pointing out that you are relying on implicit 
context information in interpreting such values (although amplifying on 
this a bit as I've just suggested would, I agree, make this even clearer).

I'm also hesitant to do anything to suggest that using properties like 
"weightInKg" is necessarily good practice either, since it involves 
encoding the units information in property names, which won't 
necessarily be interoperable with other ways of recording units 
information.  This whole issue is certainly significant, but it raises a 
lot of design issues that I'm reluctant to see elaborated on too much in 
a "Primer".  For example, units of measure are just one of the many 
pieces of metadata necessary to fully contextualize a piece of 
information.

 >
 > - The motivation for using xsd datatypes should clearly say that these
 >   are well established and should be used where appropriate. Having
 >   the possibility to use different types is good if needed, but in
 >   general, interoperability is increased when using a well-defined
 >   common set. Also, to a large extent, the value spaces, and to
 >   a somewhat smaller extent, the lexical forms, are locale-independent,
 >   which make it possible to exchange data independent of a particular
 >   human-oriented representation. This should also be mentioned.

It seems to me that Section 2.4 on typed literals already sufficiently 
encourages the use of xsd datatypes.  It says that xsd datatypes are 
"first among equals", that they are expected to be the most 
interoperable datatypes, and that they will probably be the most 
generally available.  The current wording was discussed to some extent 
by the WG, and I'm reluctant at this point to add additional 
justification that we haven't explicitly discussed.

Thanks again for your comments.

--Frank
Received on Friday, 14 November 2003 10:59:23 UTC