Re: WOFF and extended metadata from Laurence Penney on 2010-06-23 (www-font@w3.org from April to June 2010)

From: Laurence Penney <lorp@lorp.org>
Date: Wed, 23 Jun 2010 03:20:44 +0100
To: www-font@w3.org, 3668 FONT <public-webfonts-wg@w3.org>
Message-Id: <3C11F838-803D-4B58-8323-11F05565ED19@lorp.org>
[apologies, there were a few infelicities in the previous mail, so here's an improved version]


For the sake of completeness, I outline in the same manner as Jonathan the extension scheme I have been proposing. I have modified how language is specified.


<extension [name="bar"] [id="foo"]>
<!-- zero or more extension elements allowed within metadata -->

  <tag k="{key}" v="{value}">
  <!-- each tag is a key=value pair -->
  <!-- zero or more metadata items (tags) within each extension element -->
  <!-- value attribute takes character data: must encode angle-bracket markup -->
  <!-- tag elements may contain further tag elements, to unlimited depth -->

	[<translation lang="{lang_code}" v="{translated_value}"/>]
	<!-- zero or more translations of the value -->

	[<tag k="{key}" v="{value}"/>]
	<!-- zero or more child tags of each tag element -->

  </tag>

</extension>


Notes:

* <extension>'s name attribute is recommended, since it can determine the structure of the metadata it contains; W3C or a delegated authority will accept registrations for names in this space, if the applicant commits to maintaining a public version of the spec online. UAs should refer to those publications in order to present the metadata better than in key=value form. UAs should not attempt to derive any particular semantics from <extension>s without a name, and should only ever present them as a hierarchy of key-value pairs.

* The top-level <tag> element takes the place of the <item> element in Jonathan's recent sketch.

* Each <tag> element requires a k (key) element and a v (value) element.

* Each <tag> element may contain <translation>s of its value. The language of its default value however may NOT be specified; it can be regarded as a fall-back for presentation if there are no translations. Setting the language for a key is also not allowed, and nor are key translations. By making translations subordinate to their parent, this scheme ensures all semantically identical data remains together. Key translations could be easily added to the schema, if necessary. Unlike translated keys and values, a <tag>'s k attribute may be used by UAs as a string unique at that level for the purposes of building internal data structures.

* Each <tag> element may optionally contain further tag elements, to an unlimited depth, thus:

<extension name="foo">
  <tag k="{key}" v="{value}">
     <tag k="{key}" v="{value}"/>
     <tag k="{key}" v="{value}"/>
     <tag k="{key}" v="{value}">
        <tag k="{key}" v="{value}"/>
        <tag k="{key}" v="{value}"/>
     </tag>
  </tag>
</extension>

* To encode data in existing array and dictionary structures, the following schemes are recommended:

Arrays (1):

<tag k="{array_name}" v="">
  <tag k="0" v="{value_0}"/>
  <tag k="1" v="{value_1}"/>
  <tag k="2" v="{value_2}"/>
  <tag k="3" v="{value_3}"/>
</tag>

Arrays (2):

<tag k="{array_name}" v="{value_0}"/>
<tag k="{array_name}" v="{value_1}"/>
<tag k="{array_name}" v="{value_2}"/>
<tag k="{array_name}" v="{value_3}"/>
<tag k="{array_name}" v="{value_4}"/>

Dictionaries:

<tag k="{dict_name}" v="">
  <tag k="{key_1}" v="{value_1}"/>
  <tag k="{key_2}" v="{value_2}"/>
  <tag k="{key_3}" v="{value_3}"/>
</tag>

Naturally:
- <tag>s in an array may be a mix of arrays, dictionaries and simple tags;
- <tag>s in a dictionary may be a mix of arrays, dictionaries and simple tags.

* At any given level in the tag hierarchy, any repeated tag with the same key may be treated as part of an array by the UA.

* Having the name attribute on the <extension> avoids the need to have a <name> element in the specification. <translation>s may be permitted at the top level too, where they apply to the name attribute of <extension>.

* If existing metadata in an arbitrary XML format needs to be stored, the recommended method is to convert the required items into a structure based on arrays, dictionaries and key-value pairs with strings for values, then store this in the proposed schema. Entity-encoding such external XML, thus making a string that is trivially stored, but unhelpful for XPath, etc., is discouraged unless the preceding operation is impractical.

* If data that are not easily represented as strings, arrays or dictionaries need to be stored - for example a structure that includes binary data not naturally representable as text - a custom representation must be devised. This must be based on a hierarchy of key-value pairs with strings for values. Binary data should be serialized (e.g. in PHP using serialize(), json_encode() and base64_encode()). As a last resort, entire data structures may be serialized. Decoding these strings inevitably relies on a UA knowing explicit details of the encoding mechanism.

* The resort to serializing and entity-encoding methods is discouraged because UAs unaware of that extension's tag key spec will only be able to present the encoded string, and XPath and XQuery will find them impenetrable.

* A model metadata viewer and editor could look very like regedit.exe, the longstanding Windows Registry Editor.

In summary, the principal divergences from JK's specification are:
- extensible hierarchy *reduces* the need for hacks (serializing, XML);
- language subordinate to semantics;
- recommendations for storing complex, binary or XML metadata;
- only plain strings allowed as values.

The last of these is currently opposed by Liam, Chris and others. If the objections stand, then the value may be handled as element content without affecting the rest of this proposal. Still can't bear the idea of arbitrary XML (even if restricted to XHTML) ever being permitted as element content - see the recommendations above.

- L
Received on Wednesday, 23 June 2010 02:21:27 UTC