Re: What are the problems with IDML?

T. Joseph W. Lazio (lazio@spacenet.tn.cornell.edu)
Tue, 20 Aug 1996 14:23:42 -0400


From: "T. Joseph W. Lazio" <lazio@spacenet.tn.cornell.edu>
Date: Tue, 20 Aug 1996 14:23:42 -0400
Message-Id: <199608201823.OAA10408@ism.tn.cornell.edu>
To: donohoe@emerge.com
CC: www-html@w3.org
In-reply-to: <32141D69.277B@emerge.com> (message from Doug Donohoe on Fri, 16 Aug 1996 16:04:09 +0800)
Subject: Re: What are the problems with IDML?

>>>>> "DD" == Doug Donohoe <donohoe@emerge.com> writes:

DD> I'm happy to see that there is some lively discussion around IDML
DD> in this mailing list.  I'd love to hear what other people's
DD> thoughts are.

DD> The purpose of this mail is to address some of Megazone's comments
DD> in an earlier www-html posting (see below).

 I'll echo Dan C.'s comments.  Thanks for defending this in public.

DD> The intention of IDML is to provide web page publishers and
DD> merchants with a standard way to describe themselves, their web
DD> pages and their products.  Some of the questions IDML helps answer
DD> are:

DD>         1) What langauge is this document written in?
 
 The old HTML 3 proposal allowed one to specify <BODY LANG="en.us">
for U.S. English.  If I wanted to stick to META, I'd use 
<META NAME="Language" CONTENT="en.us"> or something like that.

DD> 2) Where is the publisher located physically?  

<META NAME="location" 
 CONTENT="Cornell University, Ithaca, NY, 14853-6801, USA">

DD> 3) What location is the information about?  

 ?

DD> 4) What type of entity published this information?  

<META NAME="author" CONTENT="HoTMetaL v.666">
(and there's also the use of 
<LINK REL=MADE HREF="mailto:lazio@spacenet.tn.cornell.edu">)

DD> 5) What products are available for purchase here?  6) What do
DD> those products cost and what currency are then in? 

 This doesn't strike me as meta-information, that is, information
about the information in the HTML document.  Rather, that's something
that should go in an HTML document.

DD>  7) What is this page about?

<META NAME="description" CONTENT="My reflections on life.">

DD> These questions are *business questions* that META does not
DD> answer.  We think IDML does answer this and we hope it will become
DD> a new and open standard.

 As you've probably guessed from the examples, I disagree.  META
allows one to specify meta-information about the document, much of
which need not be "business questions."  I could be interested in
putting author, physical location, or summary information into my HTML
documents without ever wanting to make a dime from them.

 Of course, there's the real question of whether user agents know what
to do with META.


DD> With that said, we still believe that the format we have chosen
DD> for IDML is better suited that META as a technical architecture
DD> for answering these questions.  [...]

>> 1.A single META tag can only describe one attribute-value pair. To
>> describe a product or page in as much detail as IDML requires many
>> META tags.
>> 
>> BFD.  So what, it doesn't matter, it is just a few more characters,
>> then everyone else can parse it fine. [...]

DD> You're right, it's not a big deal.  Rewriting IDML using meta only
DD> adds 20% to the total "space" required.  We have a sample page of
DD> the same "information" represented using IDML and Meta at:
DD> http://www.identify.com/welcome/idml_v_meta.html
DD> We didn't "have" to be different.  We chose to define new tags
DD> because META doesn't really work that well to answer the business
DD> questions listed above (for reasons explained below).

 Except that in looking at the examples described in
<URL:http://www.identify.com/welcome/idml_v_meta.html> about half the
information looks like it should be in an HTML document.  For example,
why not

<HEAD>
<TITLE> URBANFUNK - MR. NO</TITLE>
<META NAME="DEPARTMENT" CONTENT="media+information/music+recordings">
<META NAME="PART-NUMBER" CONTENT="MPR 003">
<META NAME="KEYWORDS" CONTENT="Urbanfunk,Jazz, Funky,Miles">
<BODY>
<P>
In this debut recording the group, guided by the eclectic
trumpetplayer Franco Baggiani (a Miles Davis with true Tuscan style)
offers 6 pieces which, give a pleasant taste of their imminent new CD.
<P>
15000 Lire
</BODY>


 More generally, if you can get a list of n/v pairs,
     language=...
     summary=...
     author=...
     location=...
it shouldn't be *that* difficult to extract the ones you want and use
them.


[...]
DD> However, the "visual" formatting, as it appears to humans is of no
DD> consequence and is not our point.  Think about writing a robot to
DD> crawl through each version (IDML/META) and think about what it
DD> takes to spit out the publisher, page and products represented
DD> therein.  What are the issues you'd encounter? If you've written a
DD> robot you'll recognize that the META version is harder because you
DD> have to combine information that belongs together from seperate
DD> META tags.  In IDML, you read one tag and you're done.  It is no
DD> harder to parse HTML for an <IMG> tag than it is for an
DD> <ID-PRODUCT> tag.

 It seems the more difficult problem is standardizing the NAME (and
HTTP-EQUIV) attribute values.  In other words, if I want a summary of
an HTML document do I look for 
<META NAME="description" CONTENT="...">
<META NAME="summary" CONTENT="...">
<META HTTP-EQUIV="summary" CONTENT="...">
...?


>>DD>  2.META tags are poorly suited to specifying products. [...]
>> 
>> I don't get this, I really don't.  I've looked at their tags and I
>> don't see anything I can't do with META.
[...]
DD> All I ask is this: Sit down and think what it would really take to
DD> represent 100 products in a manner that a robot could understand.
DD> Our solution isn't the result of idle whim.  We spent a lot of
DD> time thinking, researching and prototyping possible solutions.
DD> IDML is the result.

 I echo MZ's comments.  What's wrong the examples I've given above?
What's wrong with the following in an HTML document?
<CITE CLASS=PRODUCT.MUSIC.RECORDINGS>URBANFUNK - MR. NO</CITE>
<CITE CLASS=PRODUCT.MUSIC.RECORDINGS>DE POOKAN</CITE>
<CITE CLASS=PRODUCT.MUSIC.RECORDINGS>HOPO - Dietro la finestra/CITE>

 Unless you're trying to create a database in HTML.  In that case, why
reinvent the wheel?  Why not use a database to store the information
and a program to extract the desired information from the database and
produce an HTML document?

 Let me ask a simple question:  What are you trying to do?

-- Joseph Lazio