Re: RDFa and Web Directions North 2009 from Kingsley Idehen on 2009-02-14 (public-rdf-in-xhtml-tf@w3.org from February 2009)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sat, 14 Feb 2009 08:35:46 -0500
To: Ian Hickson <ian@hixie.ch>
CC: Kjetil Kjernsmo <kjetil@kjernsmo.net>, Karl Dubost <karl@la-grange.net>, public-rdfa@w3.org, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Sam Ruby <rubys@intertwingly.net>, Dan Brickley <danbri@danbri.org>, Michael Bolger <michael@michaelbolger.net>, Tim Berners-Lee <timbl@w3.org>, Dan Connolly <connolly@w3.org>
Message-ID: <4996C8B2.7010500@openlinksw.com>
Ian Hickson wrote:
> Just to clarify -- while I'm being devil's advocate below, that doesn't
> mean I personally have an opinion on this matter. I apply the same
> skeptical standards to everything that is proposed for HTML5. (If you
> think HTML5 is bloated, just imagine how big it would be if I didn't!)
>   
> Please don't take these questions as personal attacks. I honestly am 
> trying to find out how RDF and RDFa are to work in HTML5, to see if they 
> make sense to add.
>   
Nice and refreshing clarification, again :-)

In a sense, we are actually playing out via this debate the very thing 
we are hoping the Web will ultimately simplify: discourse discovery and 
participation.

More responses below.
>
> On Fri, 13 Feb 2009, Kjetil Kjernsmo wrote:
>   
>> On Friday 13 February 2009, Ian Hickson wrote:
>>     
>>> Note that you can already "ask questions" on the Web. For example, I 
>>> just searched for "which country napolean", which is neither the right 
>>> question nor correctly spelt (though that wasn't intentional), and 
>>> Google answered:
>>>       
>> Well, you just proved that google sucks, didn't you? It couldn't get the 
>> answer to that basic question right...
>>     
>
> Would a system based on RDF or RDFa give a better answer to the same 
> question? 
Covered in my earlier example which combines HTML in front of RDF data 
sources while delivering context driven "search" and "find".


A document never captures all of the data it contextualizes. All it can 
do is offer conduits to the raw data, which is what de-referencable 
entity URIs offer as Data Source Names. 
> How? Is there a system running somewhere that can demonstrate 
> this? Does it require all data to be marked up as RDFa?
>
>
>   
We must make a critical distinction between RDF and RDFa.
Resource Description Framework (RDF) is composite like all frameworks. 
Thus, we have a data model component (graph) and different mechanisms 
for creating records (statements or claims). RDFa is one of several 
mechanism for creating RDF model records. Its greatest appeal comes from 
its nativeness to HTML (via use of attributes as mechanisms for 
expressing entity-attribute-value statements).
>> Another example, I'd like to have the latest version of the SPARQL 
>> Update spec, and I expect to get it if I ask for "sparql update".
>>     
>
> How does RDF or RDFa solve this problem?
>
>
>   
>> Add a some relations, and you have a decent chance, so that's what we 
>> did, so what you can do now is just search for what you know, and with 
>> two clicks, you'll find several things that will give you some 
>> associations, such as "inflammation of the lining of the lungs", or 
>> pleuritis, which gets you on the right track. Google requires you to 
>> offer these associations up front, it doesn't assist with anything 
>> beyond simple speling mistakes, whereas just a little bit of knowledge 
>> organisation will assist people in finding what they want.
>>     
>
> Do we have reason to believe that it is more likely that we will get 
> authors to widely and reliably include such relations than it is that we 
> will get high quality natural language processing? Why?
>   
High quality NLP simply doesn't compare to what RDFa offers.

NLP is not the issue at hand here. This isn't about linguistics. It is 
about structured data, more like a DBMS.
> How would an RDF/RDFa system deal with people gaming the system?
>   
Great question!

It would help identify the people gaming the system. See the recent 
foaf+ssl [1] endeavor for instance.
> How would an RDF/RDFa system deal with the problem of the _questions_ 
> being unstructured natural language?
>   
RDF has a query language: SPARQL.

RDFa helps with the surfacing of the records that are then queried using 
SPARQL.

Two examples of the power of SPARQL:

1. Linked Data (which uses SPARQL to facilitate de-referencable URIs 
within the context of HTTP's natural ability to decouple data access and 
data representation)
2. FOAF+SSL (which extends SSL's verification with a "Web of Trust" 
component that doesn't require physical key signing parties)
> How would an RDF/RDFa system deal with data provided by companies that 
> have no interest in providing the data in RDF or RDFa? (e.g. companies 
> providing data dumps in XML or JSON.)
>   
We transform, and the expose as RDFa, as per this example which does 
expose RDFa:

http://linkeddata.uriburner.com/about/html/http://en.wikipedia.org/wiki/Napoleon_I_of_France
> How would an RDF/RDFa system deal with companies that do not want to 
> provide the data free of charge?
>
>   
The Web is about choice. If a company wants to charge so be it.

Running a company is about managing opportunity costs, ultimately.

> How would an RDF/RDFa system deal with companies that want to track 
> per-developer usage of their data?
>   
Depends, if you look at my "search" and "find" demo you will notice we 
expose the cost of the work we are doing per task. This is sort of like 
Amazon EC2 and how it itemizes and charges for computing resources. We 
don't charge for the public service, but that's our choice; others may 
choose to do otherwise (with our technology or those offered by others 
in this realm).
>
> On Fri, 13 Feb 2009, Kjetil Kjernsmo wrote:
>   
>> On Friday 13 February 2009, Ian Hickson wrote:
>>     
>>> If Amazon couldn't even be bothered to add a class for "price" in the 
>>> last  decade, why do we believe they will add RDFa? 
>>>       
>> Because RDF(a) is actually powerful, class isn't. That's what I think 
>> anyway...
>>     
>
> The problem description was just to get a relationship between an item and 
> a price. Both a simple set of classes and RDFa completely solve this 
> problem. Being more powerful is irrelevant in the context of that problem.
>
>
>   
>>> How does RDFa solve the problem that they have that I described but 
>>> that you cut from the above quotes, namely that they want to track 
>>> usage on a per-developer basis?
>>>       
>> OK, it doesn't.
>>     
>
> If the problem is that we want price data out of Amazon pages, and RDFa 
> doesn't solve the problem to Amazon's satisfaction, then why is RDFa being 
> put forward as a solution?
>   
Amazon should not be a factor in this discussion. Ditto Google, or any 
other entity. We are talking about the Web.

> This is what I mean by evaluating solutions, by the way. I don't 
> personally care whether we use RDFa or something else. I _do_, however, 
> want to make sure that whatever solution we end up using is a solution 
> that actually solves the problems we set out to solve.
>
> Here, if the problem is "associate price with item on Amazon pages", RDFa 
> does not solve the problem.
>
>
>   
RDFa allows us to choose to associate price with an item in a structured 
way.

Again, the Web (like any well designed system) should be about choices.  
Without choices we don't have any kind of innovation continuum.

>>> How did you resolve issues such as different vocabularies having 
>>> different enumerations of music genres?
>>>       
>> Mostly, they are owl:SameAs. In some cases, they are owl:UnionOf, and in 
>> some case owl:IntersectionOf. It solved every problem we had, but 
>> conceivably, you could have genres that we weren't aware of existed. In 
>> which case we would have looked up in large ontologies managed by 
>> someone else. The archive/library/museum sector has a lot of useful 
>> stuff here.
>>
>>     
>>> Why was getting data out of those varied formats hard? For ID3, for 
>>> example, it's three lines of perl to get most of the common 
>>> information.
>>>       
>> Yeah, but this was written in Java, it takes at least 10 times as much 
>> code ;-) Besides, I didn't say it was hard, I said it was harder than 
>> e.g. aligning genres. Which says a lot about how easy this is to do! :-)
>>     
>
> What did you do with the genres once you had them all aligned with union, 
> intersection, and sams-as relationships? That doesn't seem like the most 
> useful structure for data to be exposed to a random user.
>   
Again brevity is possible when the entities-attributes-values have URIs 
that are de-referencable.

Here is a document (with RDFa) about "Mosquitoes" from the GoeSpecies 
Linked Data space. A few email exchanges between the GeoSpecies kbase 
author and I lead to this:
http://linkeddata.uriburner.com/about/html/http://species.geospecies.org/family_concept_uuid/1e0e9bfe-f1ee-4b14-9511-cb896e8ebf97/

The document above is itself a purveyor of structured data for anyone 
esle on the Web to exploit.

I don't want to be the only one capable of doing this on the Web, I want 
anyone that uses the Web to be able to do this, and RDFa is a very low 
cost mechanism for achieving this goal.

I was to express myself clearly and succinctly without compromising 
clarity or brevity, when I publish documents on the Web. Likewise, I 
want to read documents from others who are able to do the very same 
thing: express themselves clearly and succinctly without compromising 
clarity or brevity.


Kingsley
>
> On Sat, 14 Feb 2009, Karl Dubost wrote:
>   
>> I love natural language processing too. It is useful, though it doesn't 
>> solve everything (except maybe in an English centric world.)
>>     
>
> Nobody said it would solve everything. My point was just that it solved 
> one specific problem as well as RDFa does.
>
>
> On Fri, 13 Feb 2009, Kingsley Idehen wrote:
>   
>> Ian Hickson wrote:
>>     
>>> On Fri, 13 Feb 2009, Kingsley Idehen wrote:
>>>   
>>>       
>>>> When writing HTML (by hand or indirectly via a program) I want to 
>>>> isolate at describe what the content is about in terms of people, 
>>>> places, and other real-world things. I want to isolate "Napoleon" 
>>>> from a paragraph or heading, and state that the aforementioned 
>>>> entity is: is of type "Person" and he is associated with another 
>>>> entity "France".
>>>>         
>>> Why? (I do not ask this rhetorically.)
>>>       
>> As a page writer:
>>
>> I want to provide pointers to detailed descriptions of the things I 
>> mention in what I write.
>>     
>
> Isn't an <a href=""> suitable for this already?
>
>
>   
>> I want to be able to express myself succinctly with pointers to other 
>> places on the Web where descriptions of the people, places, subject 
>> matter can be obtained.
>>     
>
> Again, <a href=""> seems to have solved this problem well until now, why 
> does it no longer solve the problem?
>
>
>   
>> Note, I don't want to point them to another chunk of blurb, I want to 
>> point my readers to a page that has the sole function of describing the 
>> aforementioned entities via their attributes and relationships.
>>     
>
> Why?
>
>
>   
>> I want to point to a resource that exposes a granular description of 
>> Napoleon e.g., <http://dbpedia.org/resource/Napoleon_I_of_France> 
>> instead of one that is quite opaque e.g, 
>> <http://en.wikipedia.org/wiki/Napoleon_I_of_France> .
>>     
>
> Why?
>
>
>   
>> As a page reader:
>> I want to have access to the entities behind the blurb. Today I can see 
>> an opaque but nice looking Web page, I can also see the markup behind 
>> the page, but I cannot easily discern the description of entities 
>> mentioned in a Web Page.
>>     
>
> What good are these entities? What is my dad supposed to do with them?
>
>
>   
>>> How do you envisage these annotations being used?
>>>       
>> To spread knowledge via the Web. Basically, accelerate propagation and 
>> exploitation of collective knowledge.
>>     
>
> Wikipedia seems to be doing this pretty well; how is RDFa going to make 
> the thousands or millions of Wikipedia contributors faster?
>
>
>   
>> Let's say Google did provide the perfect answer (which it absolutely 
>> doesn't), what bearing does that actually have on my example. The fact 
>> that you can do something one way is stylistic and not the basis for 
>> invalidating alternatives.
>>     
>
> In general with HTML5 we have tried to reduce redundancy in solutions -- 
> once we have solved a problem once, we try not to solve it again unless 
> there are compelling reasons to do so. This is because anything we add to 
> the Web platform will -- even if mostly unsuccessful -- have to be 
> supported for decades to come, with a resulting high price.
>
>
>   
>> Back to your Google example, Google answered, but where is the 
>> disambiguation? How does it determine the context for the search? There 
>> are many dimensions to "Napoleon" and Google statistically guessed one 
>> based on link density of its subjectively assembled index and page rank 
>> algorithm. How do you as writer or ready efficiently navigate the many 
>> aspects/facets associated with the pattern: "Napoleon"?
>>     
>
> Can an RDF/RDFa system do better from a natural language query?
>
>
>   
>> The essence of the Web is openness, no presumptions, and allegiance to 
>> "open world assumptions".
>>
>> Here is what I can do instead of depending solely on Google,. I call 
>> this "Search" + "Find" as the following sequence is a simple demo.
>>
>> visit; http://lod.openlinksw.com/fct/facet.vsp .
>>
>> The service above allows me use the attributes, relationships, and types 
>> of my search subject to find what I want. All of this happens because I 
>> am interacting with the data entities distilled from Web pages and other 
>> data sources. Remember, RDFa is about simplifying the distillation 
>> process above all else without the tedium of alternative approaches such 
>> as RDF/XML.
>>
>> Here are my steps to context driven search and find re. Napoleon:
>>
>> 1. Enter pattern: Napoleon and get what  I would get normally from Google
>> (without page rank algorithm applied)
>> 2. Click on "Type" in the Navigation section to filter my search result by
>> Entity Types
>> 3. Pick "dbpedia-owl:MilitaryConflict". "yago:BattlesOfTheNapoleonicWars ",
>> "yago:BattlesInvolvingRussia "
>> 4. From the drop down (in navigator section) click on Map
>>
>> Steps 1-4 basically translate to:
>>
>> e1 has any property whose value contains "Napoleon".
>> e1 is a dbpedia-owl:MilitaryConflict .
>> e1 is a yago:BattlesOfTheNapoleonicWars .
>> e1 is a yago:BattlesInvolvingRussia .
>>
>> Where "e" is an entity (or S in SPO parlance).
>>
>> I hope I've made my use-case for knowledge dissemination as part of an effort
>> to objectively drive collective intelligence via the Web a little clearer?
>>     
>
> If the above represents the state of the art for RDF or RDFa, then we are 
> a _long_ way from RDF being ready to be exposed to regular users.
>
> People have a hard enough time (as you point out!) doing simple natural 
> language queries where all they have to do is express themselves in their 
> own native tongue.
>
> Asking them to understand "yago:BattlesOfTheNapoleonicWars" or 
> "dbpedia-owl:MilitaryConflict" isn't going to fly.
>
> I'm also rather confused as to how this is going to scale. I am already 
> completely overwhelmed by this interface. How will it work when there are 
> hundreds of millions of users all using their own ontologies and relations?
>
> Again, these are not rhetorical questions. I'm trying to work out how RDF 
> or RDFa solve real problems for typical end users.
>
>
> On Fri, 13 Feb 2009, Kingsley Idehen wrote:
>   
>> Does it take an attribute to make a new language? But let's say that it 
>> does, is that costly when it doesn't break anything?
>>     
>
> If you are asking "What is the cost of adding an attribute to HTML5?", 
> then the answer is actually quite high. Features need implementing, 
> testing, documenting; tutorial authors need to understand and explain it; 
> data mining tools would need to support the new feature; bugs would need 
> to be worked out; user interfaces would need to be designed and usability- 
> tested, iterated upon, deployed; problems with spam and malicious data 
> would need tracking down, solutions found, deployed; security bugs, if 
> any, would need fixing; problems with data reliability would need to be 
> addressed and fixed; etc.
>
> If features solve new problems in practical ways, then the cost is worth 
> it. If they don't, then it behooves us not to add them, as the total cost 
> to the community would not be trivial.
>
>
> On Fri, 13 Feb 2009, Kingsley Idehen wrote:
>   
>> Ian Hickson wrote:
>>     
>>> On Fri, 13 Feb 2009, Kjetil Kjernsmo wrote:
>>>
>>> If Amazon couldn't even be bothered to add a class for "price" in the 
>>> last decade, why do we believe they will add RDFa?
>>>       
>> Great question.
>>
>> Because RDFa isn't a decade old.
>>     
>
> On what basis do you believe that newer technologies get adopted more than 
> older ones?
>
>
>   
>> It's taken a decade to get to the point where making statements for the 
>> RDF model is simple and unobtrusive enough for it to happen within HTML. 
>> In a nutshell, what wasn't cost-effective to Amazon over the last 10 
>> years, no longer necessarily holds today.
>>     
>
> It's no less expensive to use RDFa than use class attribute values.
>
>
>   
>> But for arguments sakes, lets assume Amazon doesn't use RDFa, so what? 
>> That's no different than your earlier analogy re. Google.
>>     
>
> If Amazon doesn't use RDFa, then the problem (how to expose prices of 
> Amazon products) isn't solved by RDFa, and thus the use case is not an 
> argument for RDFa.
>
>
>   
>> The Web is not about behemoths, it about a democratic space with multi 
>> tiered dexterity along the following vectors:
>>
>> 1. Data
>> 2. Information
>> 3. Knowledge.
>>
>> The Web was bootstrapped via point 2 (as an information space), but it 
>> needs the trinity above to stay true to its essence.
>>     
>
> The Web is not about behemoths, no. But I think it would be more accurate 
> to depict it as meritocractic and evolutionary rather than democratic; I 
> think it would be more accurate to say that it is based on social 
> interactions, end-user needs, and porn, than data, information, or 
> knowledge; and I do not think that we have any kind of responsibility to 
> keep the Web true to its original essence if the Web community as a whole 
> moves in another direction.
>
>   


-- 


Regards,

Kingsley Idehen	      Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO 
OpenLink Software     Web: http://www.openlinksw.com
Received on Saturday, 14 February 2009 13:36:37 UTC