Re: Last call comments from the I18N WG on RDF WDs from Frank Manola on 2003-12-03 (www-rdf-comments@w3.org from October to December 2003)

From: Frank Manola <fmanola@acm.org>
Date: Wed, 03 Dec 2003 11:28:35 -0500
To: Martin Duerst <duerst@w3.org>
Cc: w3c-i18n-ig@w3.org, www-rdf-comments@w3.org
Message-ID: <3FCE0F33.8030407@acm.org>
Martin--

Sorry for the late response.  First the Thanksgiving holidays 
intervened, and then I had to figure out what to do about your latest 
comments (see below).

--Frank


Martin Duerst wrote:

> Hello Frank,
> 
> Many thanks for your replies on the I18N WG comments.
> Some more comments below. Sorry this got late due to
> various meetings last week.
> 
> 
> At 21:07 03/11/14 -0500, Frank Manola wrote:
> 
>> Martin Duerst wrote:
>>
>>> Dear RDF WG,
>>> Here are the last call comments from the I18N WG on
>>> the RDF drafts. This is not necessarily by draft, but
>>> by feature.
>>
>>
>> Martin--
>>
>> Thanks for your comments on the RDF drafts.  I'm responding regarding 
>> your suggestions concerning the Primer (I'm responding in a separate 
>> message for each of your two messages).  In general, I'd like to 
>> address your points explicitly, but without making extensive 
>> modifications to existing text, or introducing numerous new or changed 
>> examples.  The Primer is already rather long, and most of the material 
>> you've commented on was in the original Last Call version.
>>
>> Regarding your first message 
>> (http://lists.w3.org/Archives/Public/www-rdf-comments/2003OctDec/0120.html): 
>>
>>
>> >
>> > - Examples/Primer: There is one important facility of RDF that is 
>> almost
>> >   completely ignored in the primer and in the general discussion in the
>> >   concepts document. This is the ability to use not only ASCII 
>> characters,
>> >   but Unicode, in literal values, URIrefs, and (with some restrictions)
>> >   XML element names, and therefore property names and names of other 
>> nodes.
>> >   While this may be of somewhat secondary importance to English readers
>> >   (but still more important than the treatment it is given), it is 
>> crucial
>> >   for translations of the specification. We suggest the following:
>> >   - Mention this possibility very early on, at the first point where
>> >     literals and URIrefs are first treated in detail (most probably
>> >     section 2.2 (or even 2.1)).
>> >   - Add a simple example at this point, or change an already existing
>> >     example slightly.
>> >   - For the extensive examples in section 6, replace some of the
>> >     current examples with equivalent examples with more international
>> >     flavor. Most of the applications in section 6 are used in a
>> >     world-wide context, and finding some examples should not be
>> >     difficult. Overall, changing or adding two to three examples
>> >     in section 6 should be sufficient. They should not be limited to
>> >     examples like example 32, which contains the copyright sign
>> >     as a single non-ASCII character, although having an example
>> >     that shows how non-ASCII characters can be convenient in a
>> >     purely English context may also be a good idea.
>> >   - The explanations to these examples should mention the fact
>> >     that RDF and XML allow Unicode characters. This does not
>> >     have to be extensive; a few short sentences, with pointers
>> >     to the relevant parts of the normative specs, should be sufficent.
>> >     Readers of the primer should not be bothered/confused with
>> >     issues such as normalization.
>>
>> I think these points can be adequately addressed by following your 
>> first suggestion above.  What I'd propose is:
>>
>> *  Note in the initial discussion of URIs in Section 2.1 (and 
>> correspondingly in Appendix A) that URIs can contain Unicode 
>> characters (citing [RDF-CONCEPTS]), so they are even more general as 
>> subjects, predicates, or objects in statements.
>>
>> *  Similarly, note in the initial discussion of XML in Section 2.1 
>> (and correspondingly in Appendix B) that XML content and (with some 
>> exceptions) tags can contain Unicode characters.
>>
>> *  Note when discussing plain literals early in Section 2.2 that the 
>> character strings can contain Unicode characters (this is the place 
>> where character strings initially come up).
>>
>> *  Note in Section 2.4 that the lexical spaces of typed literals are 
>> defined as Unicode strings (citing [RDF-CONCEPTS]), and hence 
>> non-English content can be represented.
>>
>> *  Note in Section 3.1 (I think the discussion of typed literals is 
>> the best place to do this) that XML strings (both for use in plain and 
>> typed literals) can contain Unicode characters (citing [XML] and 
>> [RDF-SYNTAX]), and hence non-English content can be represented.
> 
> 
> I think these are very good suggestions. Please go ahead.


OK;  they're in the works.


> 
> 
>> I'm *very* reluctant to make changes in examples (or do any additional 
>> research to find new ones) at this late date.  In particular, the 
>> examples in Section 6 are taken from the indicated sources, not made 
>> up for the Primer.
> 
> 
> I'm sure that for most of the example vocabularies/applications in
> Section 6, it should be rather easy to find other examples. Even
> just changing one example would go a long way.


Most of these sections were volunteered, not developed by me, and I 
don't think it will be as easy as you think (I certainly didn't run 
across any for the sections I did).  In my opinion, it really is too 
late to research and make these kinds of changes.


> 
> 
>> > - Alt container: Because of the special rule that the first element is
>> >   the default or preferred value, this is a fake alternative. This 
>> should
>> >   be changed, or a real alternative, without any preferences, should be
>> >   provided. This is in particular important if there is no preferred
>> >   version among different language versions (which is often needed for
>> >   political reasons), but we are sure there are many other cases where
>> >   it is not desirable to have a preferred alternative, or where there
>> >   just simply is no preferred alternative. (Other such examples include
>> >   voting ballots. Even for the ftp example given, a true alternative
>> >   may be desirable, to allow load balancing.)
>>
>> I believe this issue is being responded to separately, but I don't 
>> think this is a "fake alternative".  There's no rule that says a given 
>> app can't ignore the first alternative and choose one of the others 
>> for its own reasons (and, as the Primer tries hard to explain, RDF 
>> doesn't itself enforce either the "preferred" or "alternative" 
>> semantics anyway).  After all, the statement of the alternatives may 
>> reflect the opinion of whoever created the original information as to 
>> what the preferred alternative is, but it's certainly possible for 
>> someone else to have a different opinion, and select a corresponding 
>> alternative.
> 
> 
> Of course the recipient can always choose any alternative, otherwise
> the whole construct would be completely pointless. What we are saying
> is that it is impossible for the person who creates the construct to
> express that all alternatives are on the same footing. So the problem
> is with the creator of the RDF, not with the consumer.


That may be so, but strictly speaking this isn't just a Primer issue, 
and I believe there has already been a WG response indicating that the 
construct isn't going to be changed, so I'm going to drop this 
particular point.


> 
> 
>> > - Measures/weights: The primer in a very small number of instances
>> >   uses 'weightInKg', and explains why, but for the rest, it always
>> >   uses just 'weight', even when there is no reason for such an
>> >   underspecified property. For world-wide data interchangability,
>> >   such details are crucial. Unless there is a specific point to make
>> >   (e.g. when explaining rdf:value), 'weightInKg' should always be
>> >   preferred. The same applies to other properties such as
>> >   rearSeatLegRoom. Language such as (primer 4.4:)
>> >   >>>>
>> >   because frequently the value
>> >   would be recorded simply as the typed literal (as in the triple 
>> above),
>> >   relying on an understanding of the context to fill in the unstated
>> >   units information.
>> >   >>>>
>> >   should be avoided. The primer should not recommend
>> >   practices that have made Mars missions go astray, among else.
>>
>> Changing all instances of simple properties like "weight" and 
>> "rearSeatLegRoom" to include units in their names would require 
>> changes all over the Primer, and this, it seems to me, would be a 
>> rather indirect way of making the point you're really raising.  What I 
>> suggest is to add an explicit comment at the end of Section 4.4 that, 
>> while you don't need to use rdf:value as described here, this 
>> illustrates the issue that global interoperability requires that 
>> sufficient metadata (such as units information) be explicitly recorded 
>> to eliminate ambiguity.  This might be done using rdf:value, or using 
>> additional properties (or possibly using different datatypes).
> 
> 
> Pointing this out in Section 4.4 is very good, but it is way not enough.
> By this point, the reader has seen the 'weight' example many times,
> and many readers may not read that far. So at least, the problem
> should also be pointed out when 'weight' is first used.


As it turns out, I think this needs to be pointed out *before* the 
"weight" example you mention.  Read on!


> 
> 
>> One of the reasons for using examples like "weight" is familiarity 
>> (the tent example is based on actual Web pages).  Another is 
>> simplicity which, it seems to me, ought to be a main consideration in 
>> a Primer. The Primer, in giving examples (either real or made-up 
>> ones), is not "recommending" anything.
> 
> 
> Whatever it is using in the examples, it's recommending such practice,
> implicitly. And wherever we can, we should try to avoid bad practice.
> Errors with units have led rockets go astray, so this is not just
> an academic exercise.
> 
> 
>>  Moreover, the language you object to is simply stating a fact:  
>> frequently such literals are recorded in exactly the way I describe.  
>> This may be (I agree) an unfortunate practice, but it is used all over 
>> the world (in both databases and web pages).
> 
> 
> For databases, we are in a much more controlled, local, context.
> For web pages, they hopefully give the unit somewhere on the page;
> it's not really needed to give it with every single value.
> 
> But I think the primer should not be suggesting bad practice, even
> if it may be frequent. It would be similar to the HTML Rec containing
> invalid HTML examples just because invalid HTML is frequent.


I appreciate your point, but this isn't quite correct.  All the examples 
are valid RDF.  There is a difference between invalid syntax, and 
questionable design.  I'll be happy to point out the issue, but the 
Primer wasn't (and isn't) intended to be a design manual.  What I 
propose to do about this is detailed further along.


> 
> Also, for the Primer, there is another problem: People don't think about
> weights as being just numbers. The unit is always conceptually included.
> When reading the Primer, people will always assume some unit. It may
> be grams, ounces, pounds, kg, tons, and so on. Or they may just be
> left with the strange feeling that they don't have enough information.
> Saying, at the first occurrence of 'weight', that it's kg, and that it's
> discussed later (with a pointer), seems like the absolute minimum.


As I noted above, I think that dealing with this issue actually requires 
that it be covered before the "weight" example (and I'd prefer to get a 
mention of this issue into Section 2, which is discussing the general 
model, rather than waiting to Section 3, which is where the "weight" 
example occurs).  The first place where an obvious example of implicit 
units occurs is in Section 2.4, using the property "age" (typically 
assumed to be "years", but that need not be the case).  What I propose 
to do is take your suggestion above, and apply it at that point.  That 
is, note that often the values of what appear to be simple properties 
are actually more complex, since they involve units (and other similar 
metadata about the value).  In the case of "age", the unit is "years", 
and point to Section 4.4 for further discussion.  I'd also propose to 
point out this same issue when the "weight" example comes up in Section 
3.2.  Specifically, there are three separate uses of implicit units in 
that example:  in addition to "weight" (in this case, in kg), there is 
"sleeps" (number of people), and "packedSize" (square centimeters). 
Again, there'd be a pointer to Section 4.4 for further discussion.

In Section 4.4, I'd propose a paragraph that discusses the desirability 
of explicitly indicating units (and similar) information for 
interoperability.  This might be done using rdf:value (the subject of 
the Section), building the unit into the property name ("weightInKg"), 
adding an additional property ("unitOfWeight") somewhere, or via other 
means.


> 
> 
>> Also, I think the language you cite, far from being needing to be 
>> avoided, is helpful in explicitly pointing out that you are relying on 
>> implicit context information in interpreting such values (although 
>> amplifying on this a bit as I've just suggested would, I agree, make 
>> this even clearer).
>>
>> I'm also hesitant to do anything to suggest that using properties like 
>> "weightInKg" is necessarily good practice either, since it involves 
>> encoding the units information in property names, which won't 
>> necessarily be interoperable with other ways of recording units 
>> information.
> 
> 
> It will certainly be a lot more interoperable than a simple 'weight'
> property without any additional information. Please remember the
> way RDF works: data can move around freely, and the necessary
> documentation may not always be easily accessible.
> 
> 
> Regards,    Martin.
>
Received on Wednesday, 3 December 2003 11:00:36 UTC