W3C home > Mailing lists > Public > public-egov-ig@w3.org > May 2009

Fwd: fyi on the Repository Schema

From: Jose M. Alonso <josema@w3.org>
Date: Sat, 16 May 2009 15:40:53 +0200
Message-Id: <CFF5DF90-8AB1-4A0C-B27B-CFAB339439B6@w3.org>
To: eGov IG <public-egov-ig@w3.org>
I see this did not make it to the list somehow. Forwarding it. Sorry  
for noticing so late.
-- Jose

Inicio del mensaje reenviado:

> De: Daniel Bennett <daniel@citizencontact.com>
> Fecha: 16 de abril de 2009 15:57:46 GMT+02:00
> Para: "Jose M. Alonso" <josema@w3.org>, Joe Carmel <joe.carmel@comcast.net 
> >,  "Sheridan, John" <John.Sheridan@nationalarchives.gov.uk>, public-egov-ig@w3.org 
> , Dazza Greenwood <daz.greenwood@gmail.com>,  Greg Elin <greg@fotonotes.net 
> >, Chris Wallace <kit.wallace@googlemail.com>
> Asunto: fyi on the Repository Schema
>
> Jose, et al,
>
> Repository Schema is a method to map an already Internet published  
> of XML/XHTML documents so it can be used automatically as an object  
> database. Note that the author of a Repository Schema can be  
> different from the original publisher of the documents and that  
> there can be competing/complementary schemas of the same repository  
> (as opposed to the repository publisher creating an API with one  
> view/entry-point of a database).
>
> The first part of that is to accept that the URL as the unique  
> identifier for the objects. Then to gather/discover all or any of  
> the objects by URL discovery by discerning a pattern in the URL's  
> and also using XPath to discover links/URLs to non-patterned URLs or  
> by doing both. An example might be press releases or blog postings  
> where the URL pattern is the year and month and day, but XPath is  
> needed to discover the actual postings that have long string names,  
> e.g. http://www.blog.com/postings/2009/03/the_posting_explaining_repository_schemas 
>  . (this is more than URL templating to create the original URLs,  
> this is reverse engineering and/or discovering the URLs) and (note  
> that using a web sites search facility or site map with XPath  
> discovery of URLs is also possible)
>
> The second part is to describe the potential parts or sub-objects in  
> each document using a description of the object, the XPath discovery  
> of the sub-object and the XML Schema of the object. For example, in  
> a Wikipedia page, an object within that document might be the  
> "content" of the page which excludes all of the templating/ 
> navigation/etc. That XPath would be ///div[@id="bodyContent"]. And a  
> calendar event in a web page would both use XPath and XML Schema  
> based on whether the event was created using RDFa or a Microformat.  
> XPath could allow objects with the same XML Schema to be  
> differentiated, like separating a list of "friends" contact  
> information from a list of "foes."
>
> The third part is a descriptive list of usable XSL transformations  
> so that the documents can be transformed in real time to another  
> usable format, for examples to PDF, an RDF version, to a stripped  
> down version to convert into JSON for an application, etc.
>
> The fourth part is to point to indexes of the repository. This is  
> crucial for real time processing. The index could be created by  
> anyone who has previously grabbed all the documents and created an  
> index. As an example, an XQuery database engine could attach/grab  
> the index when running a query of the entire repository.
>
> The goal of the Repository Schema is to allow for real time access  
> and processing of essentially static XML/XHTML documents as if it  
> was a database using the Repository Schema to allow tools to be  
> built in advance that can use the identical approach/widgets/tools  
> for any repository. It also frees publishers of data from needing to  
> anticipate every use of their data, allows for a standardization of  
> using XML documents as a database, frees databases from needing to  
> store/screenscrape all the data internally before acting on it  
> (which many XQuery engines can do already, but with a performance  
> hit without having a pre-built index). It also may push publishers  
> of documents on the web to abide by standards such as making XHTML  
> wellformed and valid and encouraging the use of CoolURIs/human  
> readable&machine processable.
>
> Note that much of this is made easier by the XQuery doc() function  
> relatively undocumented ability to act on any well formed/valid  
> document on the World Wide Web in real time by creating the URL from  
> concatenating a string (which is a real standard as opposed to using  
> curl or other methods).
>
> Thanks tremendously to Chris Wallace for doing much of the ground- 
> breaking work and some of the proof of concept pieces in the XQuery  
> Wikibook.
>
> Joe Carmel has been building a Rosetta Stone standard that would  
> help "relate" repositories.
>
> Daniel Bennett
> daniel@citizencontact.com
> http://www.advocatehope.org/
>
>
>
Received on Saturday, 16 May 2009 13:41:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 16 May 2009 13:41:46 GMT