- From: Jose M. Alonso <josema@w3.org>
- Date: Sat, 16 May 2009 15:40:53 +0200
- To: eGov IG <public-egov-ig@w3.org>
- Message-Id: <CFF5DF90-8AB1-4A0C-B27B-CFAB339439B6@w3.org>
I see this did not make it to the list somehow. Forwarding it. Sorry for noticing so late. -- Jose Inicio del mensaje reenviado: > De: Daniel Bennett <daniel@citizencontact.com> > Fecha: 16 de abril de 2009 15:57:46 GMT+02:00 > Para: "Jose M. Alonso" <josema@w3.org>, Joe Carmel <joe.carmel@comcast.net > >, "Sheridan, John" <John.Sheridan@nationalarchives.gov.uk>, public-egov-ig@w3.org > , Dazza Greenwood <daz.greenwood@gmail.com>, Greg Elin <greg@fotonotes.net > >, Chris Wallace <kit.wallace@googlemail.com> > Asunto: fyi on the Repository Schema > > Jose, et al, > > Repository Schema is a method to map an already Internet published > of XML/XHTML documents so it can be used automatically as an object > database. Note that the author of a Repository Schema can be > different from the original publisher of the documents and that > there can be competing/complementary schemas of the same repository > (as opposed to the repository publisher creating an API with one > view/entry-point of a database). > > The first part of that is to accept that the URL as the unique > identifier for the objects. Then to gather/discover all or any of > the objects by URL discovery by discerning a pattern in the URL's > and also using XPath to discover links/URLs to non-patterned URLs or > by doing both. An example might be press releases or blog postings > where the URL pattern is the year and month and day, but XPath is > needed to discover the actual postings that have long string names, > e.g. http://www.blog.com/postings/2009/03/the_posting_explaining_repository_schemas > . (this is more than URL templating to create the original URLs, > this is reverse engineering and/or discovering the URLs) and (note > that using a web sites search facility or site map with XPath > discovery of URLs is also possible) > > The second part is to describe the potential parts or sub-objects in > each document using a description of the object, the XPath discovery > of the sub-object and the XML Schema of the object. For example, in > a Wikipedia page, an object within that document might be the > "content" of the page which excludes all of the templating/ > navigation/etc. That XPath would be ///div[@id="bodyContent"]. And a > calendar event in a web page would both use XPath and XML Schema > based on whether the event was created using RDFa or a Microformat. > XPath could allow objects with the same XML Schema to be > differentiated, like separating a list of "friends" contact > information from a list of "foes." > > The third part is a descriptive list of usable XSL transformations > so that the documents can be transformed in real time to another > usable format, for examples to PDF, an RDF version, to a stripped > down version to convert into JSON for an application, etc. > > The fourth part is to point to indexes of the repository. This is > crucial for real time processing. The index could be created by > anyone who has previously grabbed all the documents and created an > index. As an example, an XQuery database engine could attach/grab > the index when running a query of the entire repository. > > The goal of the Repository Schema is to allow for real time access > and processing of essentially static XML/XHTML documents as if it > was a database using the Repository Schema to allow tools to be > built in advance that can use the identical approach/widgets/tools > for any repository. It also frees publishers of data from needing to > anticipate every use of their data, allows for a standardization of > using XML documents as a database, frees databases from needing to > store/screenscrape all the data internally before acting on it > (which many XQuery engines can do already, but with a performance > hit without having a pre-built index). It also may push publishers > of documents on the web to abide by standards such as making XHTML > wellformed and valid and encouraging the use of CoolURIs/human > readable&machine processable. > > Note that much of this is made easier by the XQuery doc() function > relatively undocumented ability to act on any well formed/valid > document on the World Wide Web in real time by creating the URL from > concatenating a string (which is a real standard as opposed to using > curl or other methods). > > Thanks tremendously to Chris Wallace for doing much of the ground- > breaking work and some of the proof of concept pieces in the XQuery > Wikibook. > > Joe Carmel has been building a Rosetta Stone standard that would > help "relate" repositories. > > Daniel Bennett > daniel@citizencontact.com > http://www.advocatehope.org/ > > >
Attachments
Received on Saturday, 16 May 2009 13:41:46 UTC