6.10 Disambiguation 6.10.1 Definition The Disambiguation data category is used to highlight (mark up) specific conceptual patterns that may require special treatment when localizing and translating content. This data category is used for several purposes, including, but not limited to: Informing a translation service that a certain fragment of text is subject to follow specific translation rules, e.g. for proper names, or officially regulated translations, as well as to conveying a very specific meaning of the fragment. Informing content management systems and translation services about the intended conceptual type of a textual entity in order to enable processing based on this specific type for source and target languages, for example, when dealing with personal names, product names, or geographic names, chemical compounds, protein names, and so forth. Disambiguation support is achieved by associating a marked up fragment of text with an external web resource that can be dereferenced by a language review agent, i.e. by accessing the intended meaning or lexical choice of the fragment, and thereby contributing to its correct translation. A fragment of text is disambiguated at different granularities: (1) lexical type, (2) ontological concept, or (3) named entity. In the case of lexical type, the external resource may provide appropriate synonyms and example usage, such as e.g. the WordNet services do. In the case of ontological concept, the external resource may provide a formalized conceptual definition arranged in a hierarchical framework of related concepts. In the case of a named entity, the external resource may provide a fully fledged description of the associated real world entity. For instance, the word 'City' in the fragment 'I am going to the City' may be disambiguated on the basis of one of WordNet's synsets that can be represented by 'city', an ontological concept of 'City' that could represent a subclass of 'Populated Place' at the conceptual granularity level, or the central area of a particular city, e.g. 'City of London', as interpreted at the entity granularity level. Emerging linked data networks, such as DBpedia, further increase the interlinking of ontological concepts and named entity definitions for same things and in different languages, thereby offering the possibility to directly facilitate translation through a source language description. Two types of disambiguation are possible: Disambiguation for target type class, which explicitly describes the type class of the underlying concept or entity of the fragment. Disambiguation for target identity, which implicitly describes the intended meaning of the fragment through a link to an external resource. Text analysis engines, such as named entity recognizers, named entity, concept and word sense disambiguation components do offer appropriate solutions to create the needed information. Content management systems are also able to present and visualize this information, or employ it to index their content. Machine translation services may use this information for optimizing their language and translation models. The Disambiguation data category is either specified with global rules, or locally at an individual markup element. In the latter case, the disambiguation information applies to the textual content of that element. There is no inheritance. [Ed. note: Below will need a test case in the test suite.] When using disambiguation specifying the target identity, the user MUST use only one of the two addressing modes: Using disambigSource, and one of disambigIdent or disambigIdentPointer (at a global rule) to specify the collection and the identifier itself. Using one of disambigIdentRef or disambigIdentRefPointer (at a global rule) using an IRI for the disambiguation target. GLOBAL: The disambiguationRule element contains the following: A required selector attribute containing an absolute selector that selects the nodes to which that rule applies. At least one of the following: To specify the target type class, exactly one of the following: A disambigClassPointer attribute that contains a relative selector pointing to a node specifying the entity type class behind the selector. A disambigClassRefPointer attribute that contains a relative selector pointing to a node that holds an IRI that specifies the entity type class behind the selector. To specify the target identity, exactly one of the following: A disambigIdentPointer attribute that contains a relative selector pointing to a node that represents a unique identifier for the disambiguation target. A disambigIdentRefPointer attribute that contains a relative selector pointing to a node that holds a IRI that represents a unique identifier for the disambiguation target. For an example, see Example 54. LOCAL: The following local markup is available for the Disambiguation data category: An optional disambigGranularity attribute that contains a string pattern, specifying the granularity level of the disambiguation. The value can be one of the following identifiers: lexicalConcept, ontologyConcept, or entity At least one of the following: To specify the target type class: A disambigClassRef attribute that contains an IRI, specifying the type class of the concept or entity behind the selector. To specify the target identity, exactly one of the following: When addressing mode 1: A disambigSource attribute that contains a string representing the disambiguation identifier collection source. A disambigIdent attribute that contains a string, representing the disambiguation identifier for the disambiguation target that is valid within the specified disambiguation source. When addressing mode 2: A disambigIdentRef attribute that contains a IRI that represents a unique identifier for the disambiguation target.