- From: Frank Manola <fmanola@acm.org>
- Date: Fri, 14 Nov 2003 11:24:19 -0500
- To: Martin Duerst <duerst@w3.org>
- Cc: www-rdf-comments@w3.org, w3c-i18n-ig@w3.org
Martin Duerst wrote: > > Dear RDF WG, > > Here are the last call comments from the I18N WG on > the RDF drafts. This is not necessarily by draft, but > by feature. > Martin-- Thanks for your comments on the RDF drafts. I'm responding regarding your suggestions concerning the Primer (I'm responding in a separate message for each of your two messages). In general, I'd like to address your points explicitly, but without making extensive modifications to existing text, or introducing numerous new or changed examples. The Primer is already rather long, and most of the material you've commented on was in the original Last Call version. Regarding your first message (http://lists.w3.org/Archives/Public/www-rdf-comments/2003OctDec/0120.html): > > - Examples/Primer: There is one important facility of RDF that is almost > completely ignored in the primer and in the general discussion in the > concepts document. This is the ability to use not only ASCII characters, > but Unicode, in literal values, URIrefs, and (with some restrictions) > XML element names, and therefore property names and names of other nodes. > While this may be of somewhat secondary importance to English readers > (but still more important than the treatment it is given), it is crucial > for translations of the specification. We suggest the following: > - Mention this possibility very early on, at the first point where > literals and URIrefs are first treated in detail (most probably > section 2.2 (or even 2.1)). > - Add a simple example at this point, or change an already existing > example slightly. > - For the extensive examples in section 6, replace some of the > current examples with equivalent examples with more international > flavor. Most of the applications in section 6 are used in a > world-wide context, and finding some examples should not be > difficult. Overall, changing or adding two to three examples > in section 6 should be sufficient. They should not be limited to > examples like example 32, which contains the copyright sign > as a single non-ASCII character, although having an example > that shows how non-ASCII characters can be convenient in a > purely English context may also be a good idea. > - The explanations to these examples should mention the fact > that RDF and XML allow Unicode characters. This does not > have to be extensive; a few short sentences, with pointers > to the relevant parts of the normative specs, should be sufficent. > Readers of the primer should not be bothered/confused with > issues such as normalization. I think these points can be adequately addressed by following your first suggestion above. What I'd propose is: * Note in the initial discussion of URIs in Section 2.1 (and correspondingly in Appendix A) that URIs can contain Unicode characters (citing [RDF-CONCEPTS]), so they are even more general as subjects, predicates, or objects in statements. * Similarly, note in the initial discussion of XML in Section 2.1 (and correspondingly in Appendix B) that XML content and (with some exceptions) tags can contain Unicode characters. * Note when discussing plain literals early in Section 2.2 that the character strings can contain Unicode characters (this is the place where character strings initially come up). * Note in Section 2.4 that the lexical spaces of typed literals are defined as Unicode strings (citing [RDF-CONCEPTS]), and hence non-English content can be represented. * Note in Section 3.1 (I think the discussion of typed literals is the best place to do this) that XML strings (both for use in plain and typed literals) can contain Unicode characters (citing [XML] and [RDF-SYNTAX]), and hence non-English content can be represented. I'm *very* reluctant to make changes in examples (or do any additional research to find new ones) at this late date. In particular, the examples in Section 6 are taken from the indicated sources, not made up for the Primer. It may be unfortunate that they don't have a more international flavor, but the idea isn't necessarily to indicate all possible applications. Changing or adding two to three examples in section 6 may not sound like much, but it's quite a chore at this stage. Also, the main point of Section 6 is to illustrate a range of examples, not so much to illustrate the use of all of the RDF facilities introduced in earlier sections. > > - Alt container: Because of the special rule that the first element is > the default or preferred value, this is a fake alternative. This should > be changed, or a real alternative, without any preferences, should be > provided. This is in particular important if there is no preferred > version among different language versions (which is often needed for > political reasons), but we are sure there are many other cases where > it is not desirable to have a preferred alternative, or where there > just simply is no preferred alternative. (Other such examples include > voting ballots. Even for the ftp example given, a true alternative > may be desirable, to allow load balancing.) I believe this issue is being responded to separately, but I don't think this is a "fake alternative". There's no rule that says a given app can't ignore the first alternative and choose one of the others for its own reasons (and, as the Primer tries hard to explain, RDF doesn't itself enforce either the "preferred" or "alternative" semantics anyway). After all, the statement of the alternatives may reflect the opinion of whoever created the original information as to what the preferred alternative is, but it's certainly possible for someone else to have a different opinion, and select a corresponding alternative. > > - Measures/weights: The primer in a very small number of instances > uses 'weightInKg', and explains why, but for the rest, it always > uses just 'weight', even when there is no reason for such an > underspecified property. For world-wide data interchangability, > such details are crucial. Unless there is a specific point to make > (e.g. when explaining rdf:value), 'weightInKg' should always be > preferred. The same applies to other properties such as > rearSeatLegRoom. Language such as (primer 4.4:) > >>>> > because frequently the value > would be recorded simply as the typed literal (as in the triple above), > relying on an understanding of the context to fill in the unstated > units information. > >>>> > should be avoided. The primer should not recommend > practices that have made Mars missions go astray, among else. Changing all instances of simple properties like "weight" and "rearSeatLegRoom" to include units in their names would require changes all over the Primer, and this, it seems to me, would be a rather indirect way of making the point you're really raising. What I suggest is to add an explicit comment at the end of Section 4.4 that, while you don't need to use rdf:value as described here, this illustrates the issue that global interoperability requires that sufficient metadata (such as units information) be explicitly recorded to eliminate ambiguity. This might be done using rdf:value, or using additional properties (or possibly using different datatypes). One of the reasons for using examples like "weight" is familiarity (the tent example is based on actual Web pages). Another is simplicity which, it seems to me, ought to be a main consideration in a Primer. The Primer, in giving examples (either real or made-up ones), is not "recommending" anything. Moreover, the language you object to is simply stating a fact: frequently such literals are recorded in exactly the way I describe. This may be (I agree) an unfortunate practice, but it is used all over the world (in both databases and web pages). Also, I think the language you cite, far from being needing to be avoided, is helpful in explicitly pointing out that you are relying on implicit context information in interpreting such values (although amplifying on this a bit as I've just suggested would, I agree, make this even clearer). I'm also hesitant to do anything to suggest that using properties like "weightInKg" is necessarily good practice either, since it involves encoding the units information in property names, which won't necessarily be interoperable with other ways of recording units information. This whole issue is certainly significant, but it raises a lot of design issues that I'm reluctant to see elaborated on too much in a "Primer". For example, units of measure are just one of the many pieces of metadata necessary to fully contextualize a piece of information. > > - The motivation for using xsd datatypes should clearly say that these > are well established and should be used where appropriate. Having > the possibility to use different types is good if needed, but in > general, interoperability is increased when using a well-defined > common set. Also, to a large extent, the value spaces, and to > a somewhat smaller extent, the lexical forms, are locale-independent, > which make it possible to exchange data independent of a particular > human-oriented representation. This should also be mentioned. It seems to me that Section 2.4 on typed literals already sufficiently encourages the use of xsd datatypes. It says that xsd datatypes are "first among equals", that they are expected to be the most interoperable datatypes, and that they will probably be the most generally available. The current wording was discussed to some extent by the WG, and I'm reluctant at this point to add additional justification that we haven't explicitly discussed. Thanks again for your comments. --Frank
Received on Friday, 14 November 2003 10:59:23 UTC