[ISSUE-34] More in mixing profiles from Arle Lommel on 2012-08-08 (public-multilingualweb-lt@w3.org from August 2012)

From: Arle Lommel <arle.lommel@dfki.de>
Date: Wed, 8 Aug 2012 13:23:53 +0200
To: Multilingual Web LT Public List <public-multilingualweb-lt@w3.org>
Message-Id: <62F8FE8D-59FC-4158-B82C-AA0EA2D4842A@dfki.de>
Hi all,

I have to apologize for all the scatter-shot emails on the quality issue in the past few days. A number of issues have come up. This mail addresses the issue of applying multiple profiles within a document.

Initially Yves had suggested the idea of using QNames to allow tool-specific codes to be linked back to a profile. In discussion with Felix, I think it makes sense to generalize the use of (optional?) QName prefixes to the locQualityComment, locQualityType, locQualitySeverity, and locQualityScore attribute values. Strictly speaking these would no longer be QNames as I understand it since what looks like the local part in these attributes would not have reference to the actual profile contents, but would instead serve as identifiers to specify that the tool referenced the QName defined in the profile applies. (They would, however, be proper QNames in the locQualityCode attribute because they would reference specific items defined in the profile.)

The advantage of extending these prefixes to the other attributes is that it would provide an easy way to specify which profile (if multiple ones are used) applies.

So here is a code snippet that shows what this might look like, with two profiles—"ugly" and "pretty"—defined elsewhere in the document used to markup the sentence "This paragraph haves a grammatical errors”:

<para>This
   <span
      its:locQualityType="
          ugly:grammar|
          pretty:grammar
         "
      its:locQualityCode="
          ugly:GRAMMAR|
          pretty:verbal_agreement
         "
      its:locQualityComment="
          pretty:suggested replacement\:&quot;paragraph has&quot;
          OR &quot;paragraphs have &quot;
         "
    >paragraph haves</span> 
  <span
     its:locQualityType="
          ugly:grammar|
          pretty:grammar
         "
     its:locQualityCode="
          ugly:GRAMMAR|
          pretty:number_agreement OR incorrect_article
         "
     its:locQualityComment="
         pretty:suggested replacement\:&quot;a grammatical error&quot;
         OR &quot;grammatical errors&quot; OR &quot;some grammatical errors&quot;
        "
   >a grammatical errors</span>
</para>

So now, in this view, which has been formatted strangely to make it clearer, the prefixes allow multiple processes to put more than one value for an attribute into it and make it clear that BOTH apply and "sign" which tool did it.

This has some advantages over the earlier version in that it makes it easy to include multiple tools’ output without worrying about one overwriting another.

On the other hand, it does make it harder to do things like CSS styling since what previously would have been locQualityType="grammar" no has a much more complex internal syntax that makes it harder to target.

The alternative is to allow nesting of elements and get a structure like this, adding a new locQualityProfilePointer attribute:

<para>This
   <span xml:id="outer"
      its:locQualityProfilePointer="ugly"
      its:locQualityType="grammar"
      its:locQualityCode="GRAMMAR"
   >
     <span xml:id="inner"
        its:locQualityProfilePointer="pretty"
        its:locQualityType="grammar"
        its:locQualityCode="verbal_agreement"
        its:locQualityComment="pretty:suggested replacement\:&quot;paragraph has&quot;
           OR &quot;paragraphs have &quot;"
     >
      paragraph haves
     </span>
   </span>
  is a little better</para>

This allows for clean CSS access and formatting, but it runs into the problem that in principle the span with the id value of "inner" overrides the span "outer", rather than being additive, but if we can specify that they both apply, this is an easier syntax to parse and does not rely on the inner syntax of the QName prefixes and list delimiters of the first example. It is, all in all, much cleaner.

Any thoughts on which is a better solution (or suggestions for another alternative)?

-Arle
Received on Wednesday, 8 August 2012 11:24:30 UTC