- From: Elias Torres <elias@torrez.us>
- Date: Tue, 05 Dec 2006 17:40:35 -0500
Ian Hickson wrote: > On Tue, 5 Dec 2006, Elias Torres wrote: >> [...] > > I'm having trouble understanding what you're doing. > > Could you provide some actual code examples? They can be fictional, I'm > just trying to work out what you're doing. See below. > > > For example: > >> At the moment we have data defined using by XML schemas that are used by >> customers to describe industry-specific information such as automotive >> parts. > > I don't understand why you would be sending this information in HTML > documents. We have a portal based solution that shows inventory information. We need to hint at this information in the HTML in order for other portlets to display suppliers' information for example. We are trying to enable "enterprise web mashups" as opposed to just the usual Google Maps scenarios. > > >>> RDFa gives you no more than HTML5's parsing algorithm does -- you >>> still just end up with an arbitrary blob of data, the meaning of which >>> you have to define. >> I respectfully disagree. I'm not sure how familiar you are with RDFa but >> it gives specific instructions on how to find/extract tagged data within >> the page. > > Again, could you give some specific code samples to demonstrate this? > > > >> HTML pages are one possible representation of resources. These >> resources have data models that exist beyond the html page, frequently >> they exist as xml. When these resources are rendered as html we would >> like to still be able to tie the visual representation back to the >> underlying data model. This allows us, for example, to deduce that a >> person, an event or a customer order is on the page. > > I don't understand why you can't just include the information like this: > > <p class="ibm-order"> > <span class="ibm-customer"> > <span class="ibm-name">Ian Hickson</span> > (<span class="ibm-id">95237032895</span>) > </span> > has purchased a > <span class="ibm-part"> > <span class="ibm-name">Widget x12</span> > (part ID <span class="ibm-id">295250X12</span>) > </span> > </p> > <p class="ibm-order ibm-deleted"> > ... > </p> <p class="ibm-order"> <span property="ibm-customer"> <span property="ex-name">Ian Hickson</span> (<span property="acme-id">95237032895</span>) </span> has purchased a <span property="ibm-part"> <span property="ex-name">Widget x12</span> (part ID <span class="acme-id">295250X12</span>) </span> </p> <p property="ibm-order ibm-deleted"> ... </p> > > You can then process this simply: > > // find all the orders on the page: > var orders = document.getElementsByClassName(['ibm-order']); > // process them > for (var i = 0; i < orders.length; ++i) { > var order = orders[i]; > // if it's deleted, ignore it > if (order.className.has('ibm-deleted')) > continue; > // get the customer ID > var userID = order.getElementsByClassName(['ibm-customer']) > .getElementsByClassName(['ibm-id']) > .textContent; > // get the part ID > var partID = order.getElementsByClassName(['ibm-part']) > .getElementsByClassName(['ibm-id']) > .textContent; > // add this user/part to the list: > addToList(userID, partID); > } > > What would this look like in your ideal world? Could you give some > examples of what the above would be like, with code samples? > The "generic" extractor example I have in python. There's also a Javascript equivalent to that code. http://svn.rdflib.net/trunk/rdflib/syntax/parsers/RDFaParser.py I'm very familiar with the code required to parse is and it's not hard at all, the problem is that code is specific to that structure. Everytime we have a new structure, we have to write that code. Also, that code is very dependent on the tree structure. <p id="order1" class="ibm-order"> <span property="ibm-customer"> <span property="ex-name">Ian Hickson</span> (<span property="acme-id">95237032895</span>) </span> </p> .... <p> has purchased a <span about="order1" property="ibm-part"> <span property="ex-name">Widget x12</span> (part ID <span class="acme-id">295250X12</span>) </span> </p> In RDFa I can specify properties in different parts of the document such as different porlets or HTML "widgets" a la Google homepage that complete the data associated with that order. -Elias
Received on Tuesday, 5 December 2006 14:40:35 UTC