- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 20 May 2009 01:36:33 +0000 (UTC)
One of the use cases I collected from the e-mails sent in over the past few months was the following: USE CASE: It should be possible to write generalized validators and authoring tools for the annotations described in the previous use case. SCENARIOS: * Mary would like to write a generalized software tool to help page authors express micro-data. One of the features that she would like to include is one that displays authoring information, such as vocabulary term description, type information, range information, and other vocabulary term attributes in-line so that authors have a better understanding of the vocabularies that they're using. * John would like to ensure that his indexing software only stores type-valid data. Part of the mechanism that he uses to check the incoming micro-data stream is type information that is embedded in the vocabularies that he uses. * Steve, would like to provide warnings to the authors that use his vocabulary that certain vocabulary terms are experimental and may never become stable. REQUIREMENTS: * There should be a definitive location for vocabularies. * It should be possible for vocabularies to describe other vocabularies. * Originating vocabulary documents should be discoverable. * Machine-readable vocabulary information shouldn't be on a separate page than the human-readable explanation. * There must not be restrictions on the possible ways vocabularies can be expressed (e.g. the way DTDs restricted possible grammars in SGML). * Parsing rules should be unambiguous. * Should not require changes to HTML5 parsing rules. I couldn't find a good solution to this problem. The obvious solution is to use a schema language, such as RDFS or OWL. Indeed, that's probably the only solution that I can recommend. However, as we discovered with HTML5, schema languages aren't expressive enough. I wouldn't be surprised to find that no existing schema could accurately describe the complete set of requirements that apply to the vCard, vEvent, and BibTeX vocabularies (though I haven't checked if this is the case). For any widely used vocabulary, I think the best solution will be hard-coded constraints and context-sensitive help systems, as we have for HTML5 validators and HTML editors. For other vocabularies, I recommend using RDFS and OWL, and having the tools support microdata as a serialisation of RDF. Microdata itself could probably be used to express the constraints, though possibly not directly in RDFS and OWL if these use features that microdata doesn't currently expose (like typed properties). Regarding some of the requirements, I actually disagree that they are desireable. For example, having a definitive location for vocabularies has been shown to be a bad idea for scalability, with the W3C experiencing huge download volume for certain schemas. Similarly, I don't think that the "turtles all the way down" approach of describing vocabularies using the same syntax as the definition is about (self-hosted schemas) is necessary or, frankly, particularly useful to the end-user (though it may have nice theoretical properties). In conclusion: I recommend using an existing RDF-based schema language in conjunction with the mapping of microdata to RDF. Implementation experience with how this actually works in practice in end-user schenarios would be very useful in determining if something more is needed here. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 19 May 2009 18:36:33 UTC