- From: <w3t-archive+esw-wiki@w3.org>
- Date: Tue, 28 Jun 2005 06:30:09 -0000
- To: w3t-archive+esw-wiki@w3.org
Dear Wiki user, You have subscribed to a wiki page or wiki category on "ESW Wiki" for change notification. The following page has been changed by YvesSavourel: http://esw.w3.org/topic/its0505ReqUniqueID ------------------------------------------------------------------------------ === Summary === - It should be possible to attach a unique identifier to any localisable item. This identifier should be completely unique across all documents but should be identical across all translations of the same item. + It should be possible to attach a unique identifier to any localizable item. This identifier should be unique within a document set, but should be identical across all translations of the same item. - [[CL Does "unique accross all" mean "globally unique" or "unique within a document set". If it is the latter, we need a mechanism that describes a document set.]] - - '''[[FS-''' "unique accross all documents" leads to a problem: You cannot assign two attributes with the type ID to an element. If there is already an ID attribute in a schema, it might not be possible to change the value for ITS purposes. So we might have the problem that the same ID value occurs in different documents, since ID uniqueness in XML is defined document specific. That is, if we want to assure uniqueness accross documents, we have to have a mechanism external to DTDs / XML Schema (Relax NG allows ID values only for compatibility reasons with DTDs), e.g. with XPath.''']]''' - - '''[[YS-''' I think we want "unique within the document", which is much easier to insure. As far as making it unique outside of the document, I would guess one would simply need to add the document full path, relative path or filename, depending on the context of work (just like HTML anchors). Globably unique could be achieved also if needed, but I'm not sure if it would require a description of the document set. For example: MS GUIDs are "globally unique" and-I think-don't deal with document set(?) (but obviously it would be nicer to work with more human-readable IDs).''']]''' - - '''[[AZ-''' The best method is to attribute a unique ID to the document and then have unique ids within the document for individual text units. In this way the combination of unique document ID and unique text unit id are guaranteed to be totally unique. This is similar to Yves' suggestion, but separates out text unit ids from the document id. In order to achieve a unique document ID you take a CRC of the document and add to it the UTC current time in milliseconds as a modifier. All of this is covered by the proposed LISA OSCAR xml:tm specification (http://www.xml-intl.com/docs/specification/xml-tm.html).''']]''' - - '''[[FS-''' Maybe I misunderstood. I had no problem with the construction of the ID value, e.g. as a combination of a string to identify the document plus a string to identify the element. I only worried about the validation mechanisms for cross-document IDs, which do not exist (see the statement below on xml:id). But if validation across documents is not necessary, but only retrieval, then we can state s.t. like "The IDs across documents are not used for schema based validation of uniqueness. Hence, cross-document validation of ID values is not an issue."''']]''' - - '''[[CL-''' How about [http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html] as a quick guideline thought related to the generation IDs (actually GUIDs)?''']]'''. - - '''[[FS-''' I don't understand the reference to UUID. But I'm fine with the text at it is now, so we could get rid of the comments.''']]''' === Challenge/Issue === - In order to most effectively re-use translated text where content is re-used (either across update versions or across deliverables) it is necessary to have a unique and persistent id associated with the element. + In order to most effectively re-use translated text where content is re-used (either across update versions or across deliverables) it is necessary to have a unique and persistent identifier associated with the element. This identifier allows the translation tools to correctly track an item from one version or location to the next. After one is sure that this is the same item, the content can be examined for changes, and if no change has taken place the potential for re-use of the previous translation is very high. Change analysis constitutes an extremely powerful productivity tool for translation when compared to the typical source matching (a.k.a. translation memory) techniques, which simply look for similar source text in the database without, most of the time, being able to tell whether the context of its use is the same. - This change analysis technique has been possible with UI messages in the past, but the introduction of structured XML (and SGML) documents will allow for its use in documents also. + This change analysis technique has been possible with user-interface messages in the past, but the introduction of structured XML (and SGML) documents will allow for its use in documents also. === Notes === The xml:id attribute defined here [http://www.w3.org/TR/xml-id/] (Currently only a Candidate Recommendation) may be a mean to carry the unique identifier. Note however, that xml:id is unique within a document, not necessarily within a set of docments. - [[RI Note that cross document uniqueness is not difficult to achieve if we use URIs, where the in-document id is a fragment identifier. When we originally wrote this requirement we hadn't thought much about URIs as a permanent unique reference for any document.]] + === Quick Guidelines === + + There are multiple methods for creating unique identifiers, for example: + + * Using the CRC of the document and add to it the UTC current time in milliseconds as a modifier. See an example of this solution in the xml:tm specification [http://www.xml-intl.com/docs/specification/xml-tm.html]. + + * Using the mechanism described in the Java API documentation for UUIDs. See [http://java.sun.com/j2se/1.5.0/docs/api/java/util/UUID.html] for for details. + + * Using URIs, where the in-document ID is a fragment identifier. +
Received on Tuesday, 28 June 2005 08:13:39 UTC