- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Wed, 20 May 2009 13:50:02 +0300
On May 20, 2009, at 10:27, Henri Sivonen wrote: > However, in order to usefully apply RELAX NG or Schematron to a > microdata-base infoset, the infoset conversion should turn property > names into element names. Since XML places arbitrary limitations on > element names (and element content), this mapping would have exactly > the same complications as mapping microdata to RDF/XML. Here's an attempt at mapping microdata to XML: * Have a root element (it doesn't matter what it's called) with attribute xml:lang that has the language of the root element of the HTML document. * Have a child of root with local name 'title', namespace 'http://purl.org/dc/terms/title' and content that is the content of HTML <title> * For each link relation in the document, have a child of root that has as its local name the ASCII-lowercased rel token (or ALTERNATE- STYLESHEET for alternate stylesheet), namespace http://www.w3.org/1999/xhtml/vocab# and no-namespace attribute 'url' that contains the absoluticized href of the link relation. * For each <meta name content>, have a child of root with the value of the name attribute of the <meta> as local name, namespace http://www.w3.org/1999/xhtml/vocab# and the value of the content attribute as element content. If the language of the <meta> differs from root, have xml:lang with the different language. * For cites, do the link thing analogously to how cites are handled in the RDF conversion. * For items and properties: - map the property name to XML namespace,local pair as follows and use the result as the element name for the 'property element': * If itemprop contains a colon: Locate the last # or / whichever comes last but isn't the last character of the URI. Make the part up to and including that character the namespace URI and the part after the local name. * Otherwise: Namespace is http://www.w3.org/1999/xhtml/custom# and the propitem token is the local name. - If value is a URL, put the URL value in an attribute called 'url' on the property element. - If the value is itself an item, put the value of the item attribute on the property element in the value of an attribute called 'type' in no namespace. - Otherwise, put the string value in the content of the property element and put the language of the property on the xml:lang attribute of the property element if different from its nearest ancestor xml:lang. Without actually trying, on the face of things, this kind of mapping seems tractable to RELAX NG schemas. And, as mentioned before, this breaks when: 1) The local name becomes non-NCName. 2) textContent in HTML contains non-XML characters Use the infoset coercion rules for those. However, the Uhhhhhh notation may be collided, because microdata property names aren't lowercased. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Wednesday, 20 May 2009 03:50:02 UTC