Re: overstock.com adds GoodRelations in RDFa to 900,000 item pages

On Wed, Oct 6, 2010 at 1:49 PM, Martin Hepp <martin.hepp@ebusiness-unibw.org
> wrote:

>
>
> It is too expensive to expect data owners to lift their existing data to
> academic expectations. You must empower them to preserve as much data
> semantics and data structure as they can provide ad hoc. Lifting and
> augmenting the data can be added later.


     Don't get the idea that "academic expectations" are better than
"commercial expectations",  they're just different.

     The whole point of Ontology2 is to commercize information extraction
with a philosophy very much like what these folks are doing:

http://rtw.ml.cmu.edu/papers/carlson-aaai10.pdf

      Now in some ways they've got something way more advanced than what
I've got:  however,  they say that their ontology is populated "with 242,453
new facts with estimated precsion on 74%."

      For me,  I can't get away with an estimated precision of 74%,  I'd
look like a total fool publishing data that dirty on the web,  unless I can
find some way to conceal the dirt.  Talking with people who are interested
in semantic technology for e-commerce,  I find a common desire is to not
only reduce the cost of human labor but to also build systems that attain
superhuman accuracy in describing and categorizing products (at least better
accuracy than the people who are doing this job today.)

      [Note also that the rate of fact extraction these guys are doing isn't
so hot either... You can get 10^7-10^8 facts out of dbpedia+freebase
covering a similar domain]

      From a commercial viewpoint,  imperfect data is an opportunity.  If I
didn't have other projects ahead of it in the queue,  I'd seriously be
thinking about building a shopping aggregator that cleans up GoodRelations
and other data,  reconciles product identities,  categorizes products,
creates good product descriptions,  and make something that improves on
current affiliate marketing and comparison shopping systems.

      Note that the beauty of an ontology is in the eyes of a user.  One
user might want to have a broad but vague ontology of "products",  they are
happy to say that a digital camera is a :DigitalCamera.  Other people might
want to just cover the photography domain,  but do it in great detail --
describing both the differences between digital cameras manufactured today
but also lenses,  and even covering,  in great detail,  vintage cameras that
you might find on eBay.

      You can't say that one of these ontologies is better than the other.
The best thing is to have all of these ontologies available [populated with
data!] and to pick and choose the the ones that fit your needs.

Received on Thursday, 7 October 2010 15:14:50 UTC