- From: Tab Atkins Jr. <jackalmage@gmail.com>
- Date: Mon, 13 Jun 2011 11:32:35 -0700
On Mon, Jun 13, 2011 at 2:29 AM, Brett Zamir <brettz9 at yahoo.com> wrote: > Thanks, that's helpful. Still would be nice to have item-* though... Well, your idea for custom item-* attributes is just a way to more concisely embed triples of non-visible data. You already have a mechanism for embedding non-visible triples (<meta> or <link>), so the new method needs some decent benefits to justify the duplication of functionality. Additionally, while we recognize that non-visible data is sometimes necessary to embed, we'd like to discourage its use as much as possible (in general, non-visible data rots much faster). One way to do that is to make the syntax slightly cumbersome or ugly - when you really need it, you can use it, but your aesthetic sense will keep it from being the first tool you reach for. So, making it easier or prettier to embed non-visible triples is actually something we'd like to avoid if we can. >> Note, though, that Microdata or RDFa may not be quite appropriate for >> this kind of thing. ?You're not marking up data triples for later >> extraction as independent data - you're doing in-band annotations of >> the document itself. ?As such, a different mechanism may be more >> appropriate, such as your original design of using a custom markup >> language in XML, or using custom attributes in HTML. ?There's no >> particular reason for these sorts of things to be readable by >> arbitrary robots; it's sufficient to design for ones that know exactly >> what they're reading and looking for. > > With the likes of Google offering Microdata-aware searches, I think it makes > a whole lot of sense to allow rich documents such as TEI ones to enter as > regular document citizens of the web, whereby the limited resources of such > specialized semantic communities can leverage the general purpose and > better-supported services such as Google's Microdata tool, while also having > their documents editable within the likes of WYSIWYG HTML text editors, and > stored on sites such as discussion forums or wikis where only HTML may be > allowed and supported. > > I think such a focus would also enable the TEI community to benefit from > reusing search-engine-recognized schemas where available, as well as helping > the web community build new schemas for the unique needs of encoding > academic texts. I haven't yet looked into TEI's metadata scheme, but is the TEI metadata actually something that needs to be known to search engines? The one example you've presented in your emails, annotating that some parts of a transcription were water-damaged (and thus presumably possibly inaccurate?), isn't something useful for search engines, but only for humans looking at the document as a whole. If most of the other metadata is similar, then the only reason to use Microdata is to potentially make it easier to read/embed data via Microdata-aware WYSIWYG editors (are there any?). Or, possibly, to use Microdata-extraction tools. Is it useful to, for example, extract all the water-damaged text from a document, minus the context in which it appeared? Otherwise, one might as well just use data-* attributes to mark up triples directly on the subjects. That would give you most of the benefits with much less verbosity and more direct linkages between data and metadata. It would also be somewhat easier to style with CSS: <span data-tei-damage="water"> Some water damaged words </span> span[data-tei-damage=water] { ... } ~TJ
Received on Monday, 13 June 2011 11:32:35 UTC