[whatwg] Extensible microdata attributes

On 6/13/2011 2:41 PM, Tab Atkins Jr. wrote:
> On Sat, Jun 11, 2011 at 4:20 AM, Brett Zamir<brettz9 at yahoo.com>  wrote:
>> For example, to take a water-damaged text (e.g., for the TEI element
>> http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-damage.html ) which
>> in TEI could be expressed as:
>>
> e>  <damage agent="water" xmlns="http://www.tei-c.org/ns/1.0/">Some water
>> damaged words</damage>
>>
>> might be represented currently in Microdata as:
>>
>> <span itemprop="damage" itemscope=""
>> itemtype="http://www.tei-c.org/ns/1.0/">
>> <meta itemprop="agent" content="water"/>
>>     Some water damaged words
>> </span>
> This still wouldn't quite work.  Embedded Microdata has no
> relationship with the surrounding DOM - the only meaning carried is
> whatever is actually being denoted as Microdata.  So, in the above
> example, you're indicating that there is some water damage, but not
> what is damaged.
>
> If you wanted to address this properly, you'd need to format it like this:
>
> <span itemprop=damage itemscope itemtype="http://www.tei-c.org/ns/1.0/">
>    <meta itemprop=agent content=water>
>    <span itemprop=subject>Some water damaged words</span>
> </span>
>
> This way, when you extract the Microdata, you get an item that looks like:
>
> { "items": [
>      { "properties": {
>          "damage": [
>            { "type": "...",
>              "properties": {
>                "agent": ["water"],
>                "subject": ["Some water damaged words"]
>              }
>            }
>          ]
>        }
>      }
>    ]
> }
>
Thanks, that's helpful. Still would be nice to have item-* though...
> Note, though, that Microdata or RDFa may not be quite appropriate for
> this kind of thing.  You're not marking up data triples for later
> extraction as independent data - you're doing in-band annotations of
> the document itself.  As such, a different mechanism may be more
> appropriate, such as your original design of using a custom markup
> language in XML, or using custom attributes in HTML.  There's no
> particular reason for these sorts of things to be readable by
> arbitrary robots; it's sufficient to design for ones that know exactly
> what they're reading and looking for.

With the likes of Google offering Microdata-aware searches, I think it 
makes a whole lot of sense to allow rich documents such as TEI ones to 
enter as regular document citizens of the web, whereby the limited 
resources of such specialized semantic communities can leverage the 
general purpose and better-supported services such as Google's Microdata 
tool, while also having their documents editable within the likes of 
WYSIWYG HTML text editors, and stored on sites such as discussion forums 
or wikis where only HTML may be allowed and supported.

I think such a focus would also enable the TEI community to benefit from 
reusing search-engine-recognized schemas where available, as well as 
helping the web community build new schemas for the unique needs of 
encoding academic texts.

Brett

Received on Monday, 13 June 2011 02:29:49 UTC