Re: What are the problems with IDML?

Doug Donohoe (donohoe@emerge.com)
Fri, 16 Aug 1996 16:04:09 +0800


Message-ID: <32141D69.277B@emerge.com>
Date: Fri, 16 Aug 1996 16:04:09 +0800
From: Doug Donohoe <donohoe@emerge.com>
CC: www-html@w3.org, Alex Neth <aneth@emerge.com>, Bob Lord <lord@emerge.com>,
Subject: Re: What are the problems with IDML?

Hello,

By way of introduction, my name is Doug Donohoe, and I am a 
member of the team that built Identify (http://www.identify.com).

I'm happy to see that there is some lively discussion around
IDML in this mailing list.  I'd love to hear what other
people's thoughts are.

The purpose of this mail is to address some of Megazone's 
comments in an earlier www-html posting (see below).

Thanks,

-Doug

Megazone wrote:
> Subject: What are the problems with IDML? (fwd)
> To: www-html@w3.org
> Date: Thu, 15 Aug 1996 20:10:52 -0700 (PDT)
> From: MegaZone <megazone@livingston.com>
>
> 
> I forgot to comment:
> 

Before I begin, I'll just say this.  The question of whether or not
to use META or define new tags is a technical, religious argument.

The intention of IDML is to provide web page publishers and merchants
with a standard way to describe themselves, their web pages and
their products.  Some of the questions IDML helps answer are:

        1)  What langauge is this document written in?
        2)  Where is the publisher located physically?
        3)  What location is the information about?
        4)  What type of entity published this information?
        5)  What products are available for purchase here?
        6)  What do those products cost and what currency are then in?
        7)  What is this page about?

These questions are *business questions* that META does not
answer.  We think IDML does answer this and we hope it will
become a new and open standard.

With that said, we still believe that the format we have
chosen for IDML is better suited that META as a technical 
architecture for answering these questions.  You can find 
out why by reading on.

> 1.A single META tag can only describe one attribute-value
>         pair. To describe a product or page in as much detail as
>         IDML requires many META tags.
> 
> BFD.  So what, it doesn't matter, it is just a few more characters,
> then everyone else can parse it fine.  But *no* they had to be
> different.
> 

You're right, it's not a big deal.  Rewriting IDML using meta only
adds 20% to the total "space" required.  We have a sample page of the
same "information" represented using IDML and Meta at:

        http://www.identify.com/welcome/idml_v_meta.html

We didn't "have" to be different.  We chose to define new tags
because META doesn't really work that well to answer the
business questions listed above (for reasons explained below).
        
> 2.All META tags must appear in the HEAD section of the
>       document. The fact is only 4% of documents on the web use
>       HEAD tags and a mere 0.5% use META tags
> 
> What kind of snake oil salesman logic is this???  If only .5% use meta
> tags, how many people are going to use their proprietary, lesser known
> tags!  On top of that - "People don't use HEAD, therefore any tag that
> needs to go into HEAD is bad."

We are simply pointing out that the use of META is low.  That leads
to the question:  Why is it so low?  One possible answer is that META
is not serving the purpose it aspires to.  Another is that non-technical
people don't understand META and therefore don't use it.

Yes, IDML is lesser known right now -- but it's brand new.  And no,
IDML is *NOT PROPRIETARY* -- we have been getting feedback from the
user community (including this one) and have been incorporating it to
improve the overall idea.  We are in the process of assembling a
proposal for the standards organizations.

> 
> WHAT?  The <HEAD> tags are not required.
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
> <html>
> <title>spew</title>
> <LINK REV=MADE HREF="mailto:spider@livingston.com">
> <META NAME="fu" CONTENT="bar">
> <BODY>
> etc deleted
> 
> That is perfectly valid!

Not that this is pertinent to IDML as a whole, but the fact is
that according to the W3C standard, META tags may only appear
inside HEAD tags.  You can validate this for yourself at:

   http://www.sandia.gov/sci_compute/elements.html#META

Where it says that META is "allowed in content of <HEAD>" (and
nothing else).  Nevertheless, the above is not rejected
by browsers, so I'll concede this point.

> 
> Because you need several META tags to specifiy a product or
>        page, it can be hard to discern where one group ends and the
>        other begins. This also introduces maintenance problems: META
>        tags that belong together could easily be broken apart.
> 
> First - the deliberately formated the page to make it look worse.
> Second - is using comments to mark off blocks beyond them?

If you think we deliberately formatted the page to make it look worse,
then please present a format that looks "nice" (we'll update our
FAQ with any suggested changes).  In the comparison page I mentioned
above, we've formatted it as "nicely" as possible, adding comments, etc.

However, the "visual" formatting, as it appears to humans is of no
consequence and is not our point.  Think about writing a robot to
crawl through each version (IDML/META) and think about what it takes
to spit out the publisher, page and products represented therein.  
What are the issues you'd encounter? If you've written a robot you'll
recognize that the META version is harder because you have to combine
information that belongs together from seperate META tags.  In IDML,
you read one tag and you're done.  It is no harder to parse HTML for
an <IMG> tag than it is for an <ID-PRODUCT> tag.

> 
> 2.META tags are poorly suited to specifying products. Some
>     Identify merchants have over 600,000 products; to catalog their
>     products using only META is ugly and impractical. We found
>     that a separate, dedicated tag just for product-tagging and
>     content-tagging provided greater flexibility and clarity.
> 
> I don't get this, I really don't.  I've looked at their tags and I
> don't see anything I can't do with META.

This is merely an elaboration of the previous point.  You say "I 
don't get this".  What don't you understand?  Perhaps we can clear 
up any questions you have.

All I ask is this:  Sit down and think what it would really take 
to represent 100 products in a manner that a robot could understand.
Our solution isn't the result of idle whim.  We spent a lot of time
thinking, researching and prototyping possible solutions.  IDML is
the result.

> 
> 3.The web is a big place. It's getting bigger all the time, 
>        doubling in
>        size every 2-3 months. In order to keep up-to-date with the
>        effort of cataloging content and products, the process of
>        gathering this information has to be automated. It is simply
>        easier to teach a robot to understand IDML tags than a group of
>        META tags.
> 
> BULLSHIT!!!  Anyone who has coded any kind of text parser knows 
> that once you can parse one META tage you can pretty much parse them
> all and generate the name-value pairs.  This is near to an 
> outright lie.

Agreed, you can generate name-value pairs.  So, an existing parser could
easily identify the n/v pairs in an IDML block.  However, the point I've
made before is that it is more difficult to piece seperate tags into a
coherent whole (which is what you would have to do if you used multiple
META tags to represent a single object -- e.g., a product, publisher,
page).

> 
> 4.The big reason: To succeed, IDML had to be simple and quickly
>      adopted -- just like HTML. We found that few publishers today
>      use HEAD properly, and hardly any use META.
> 
> See my points above:
> 1. HEAD is ***NOT*** required.
> 2. If they aren't using META it doesn't mean they *CAN'T*.  *I* 
>    don't use meta - but I don't WANT TO!  I don't want to use IDML
>    either.  And if I pick one it is going to be META because it is
>    universal.

No HEAD, is not required (as I stated earlier).  But it is required 
if you want to use META.  If people aren't using META you're right,
they are not restricted from doing so.  But why don't they?  Because
they get no value out of it.  At least with IDML, it offers the
potential of returning its users some value (assuming other search
engines utilize it -- which we encourage!).

We hope IDML becomes universal, as a non-proprietary standard.  
Maybe then you'll reconsider.

> 
> We're not the first people to propose a content-tagging system. We
>  believe that the others never caught on because they were too complex
>  for non-computer scientists to implement. META is a technical
>  language; IDML is a business language.
> 
> bullshit bullshit bullshit
> 

My first response to your argument is "Marcia, Marcia, Marcia".
Clearly, that would leave us at a stand-still.

My second response is to simply say this:  We have done extensive
research and have found several researchers have made similar
proposals (some with new HTML tags).  As you'll no doubt recognize, 
none of these are in use today (because everyone is hanging on to
the "universal" META tag).  That's our point.  If you'd like research
references to these past attempts, drop us a line and we'll be
happy to help.


> -MZ
> --
> Livingston Enterprises - Chair, Department of Interstitial Affairs
> Phone: 800-458-9966 510-426-0770 FAX: 510-426-8951
> megazone@livingston.com
> For support requests: support@livingston.com
> <http://www.livingston.com/>
> Snail mail: 6920 Koll Center Parkway  #220, Pleasanton, CA 94566


-- 


J. Douglas Donohoe
-------------------------------------------------------------------
Emerge Consulting			   Chief Technology Officer
415.328.6700			              http://www.emerge.com
donohoe@emerge.com			    http://www.identify.com