- From: Lieske, Christian <christian.lieske@sap.com>
- Date: Tue, 26 Apr 2005 15:27:18 +0200
- To: <public-i18n-its@w3.org>
Hello, During the 2005-04-21 teleconference I took the action item to look into existing approaches related to the identification of terms. Please find my results below. Best regards, Christian --- Markup related to terminology comes in at least two disguises: 1. dedicated: markup whose primary purpose is the codification of information related to terminological or lexical data (examples: TBX, OLIF) 2. non-dedicated: markup whose primary purpose is the codification of other information but which has provisions for terminological/lexical data (examples: DocBook, XHTML) For the purpose of the current ITS discussion about existing term identification approaches, the remainder focuses on the second type (non-dedicated). The general observation about non-dedicated markup for terminology is to be the following: no standard exists. Different approaches are taken wrt. at least four dimensions (see below). 1. Approach to Term Classes Terms can be classified for example as abbreviation, initialism, acronym etc. Accordingly, we see at least the following approaches in markup a. The class is an attribute to an element (here, the attribute value is meant to correspond to the data category "abbreviation" from ISO 12620). <term class="ISO12620:2.1.8.1">W3C</term> b. Selected classes (e.g. abbreviations) get their own representation by means of an element. <abbrev>W3C</abbrev> 2. Approach to Term-related Information Often, the value of terminology is increased through term-related information such as usage information (e.g. "deprecated"), alternate forms or cross-references. Accordingly, we see at least the following approaches in markup a. The alternate form is an attribute to an element. <abbrev fullForm="World Wide Web Consortium">W3C</abbrev> b. The alternate form is given its own representation as an element. This element gets referenced. <abbrev fullForm="#ffW3C">W3C</abbrev> <fullForm id="ffW3C">World Wide Web Consortium</fullForm> 3. Approach to Location Several approaches exist wrt. the location of term-related information (e.g. terms and definitions) a. inline <para>This paragraph contains an inline term definition. <termdef>A software module called an <glossterm>XML processor</glossterm> is used to read XML documents and provide access to their content and structure.</termdef> The definition comes from <link xlink:href="http://www.w3.org/TR/REC-xml">the XML Recommendation</link>.</para> b. block <dl> <dt>XML processor</dt> <dd>Software module called used to read XML documents and provide access to their content and structure.</dd> </dl> 4. Relationship to Automated Text and other Markup Very often terms are viewed as good candidates to be included in special types of processing such as generating a back of the book index, or special weighting in indices build by search engines. However, different approaches are taken. a. explicit If a term is to be used for example as an entry in a back of the book index, it has to be tagged specifically. <indexTerm>W3C</indexTerm> The <term>W3C</term> is a standards body. b. implicit No special markup is used. Rather, the processing kind of repurposes the existing markup. An indexer e.g. may be configured in such a way that all 'term' elements are treated in a special way. 5. Motivation Term-related markup seems to have different motivations which stretch from special rendering (all terms should stand out from ordinary text) to special purpose applications (e.g. linking to a GUI menu item). Accordingly, term-related markup gets inserted by people with differing skills sets for specific purposes.
Received on Tuesday, 26 April 2005 13:27:29 UTC