- From: Babich, Alan <ABabich@filenet.com>
- Date: Wed, 29 Jul 1998 15:43:38 -0700
- To: "'webdav'" <w3c-dist-auth@w3.org>
I think the best way to way to use variants is to limit one's use of variants to being a mechanically generated rendition of some other preexisting variant of the document. For example, the original document might be a word processing document. Another variant of it could be a postscript rendition of the document. For an image, a second variant could be a postage stamp version of the image, or a PDF version of the image. I don't think that a hand translated version of a document should normally be stored as a variant. I think that in general the best thing to do is to store hand translated versions as separate documents: Consider the properties of the document. Properties like title, author, and keywords are generally string properties that should be translated along with the content when the document is hand translated. Often, the character set should be different in the translated version as well. For example, let's say you have an English document in ASCII and you translate it to Japanese using an Asian character set. People fluent in multiple languages are relatively rare. Most of us have one language we prefer to operate in, so if you are a system administrator designing the schema for a document collection, you should accommodate that case well. Therefore, you want to design the schema such that English speakers can find the English version of the document by its title, author, and keywords in English in ASCII, and a Japanese speaker can find the document by those attributes in Japanese in an Asian character set. This means you need an English title attribute for the document and a Japanese title attribute for the document if you store the hand translated version of the document as just a variant of the original document. But that approach doesn't scale. Suppose you could potentially also have a French, Norwegian, German, or "any language on Earth" translation of the document? Having an English_title, a Japanese_title, a French_title, etc. for the document is an approach that is not very appealing and clearly doesn't scale well. (For example, if you stored the properties in certain RDBMS's, if you had a title, author, and keywords attribute defined for every language on Earth, you would exceed the maximum number of columns allowed for a table.) The leading alternative is to simply have one title attribute, e.g., Title, and store each translation of the document as a separate document. Each separate version of the document would have the title in the appropriate language in the appropriate character set. Then, all users could query the Title attribute in their preferred language in their preferred character set. (Let's assume that the collection uses UCS or Unicode as the underlying character set, so that it can deal with virtually any character set in a uniform manner). Then everyone's query would find a copy of the document they can read, if one exists. (Of course, associating hand translated versions of a document with each other is often desirable, but doing that is a separate issue. Multiple alternatives exist.) So, I think we should refrain from using hand translations of documents as an example of variants, because, while we can't prevent people from storing hand translations as variants, I feel that would be encouraging bad practice. I think we should use better chosen examples, based on mechanically generated renditions, and design for that case. Alan Babich
Received on Wednesday, 29 July 1998 18:46:51 UTC