Re: Natural language marking in HTML from Martin J. Duerst on 1997-03-08 (www-international@w3.org from January to March 1997)

From: Martin J. Duerst <mduerst@ifi.unizh.ch>
Date: Sat, 8 Mar 1997 15:33:07 +0100 (MET)
To: "M.T. Carrasco Benitez" <carrasco@innet.lu>
cc: lee@sq.com, unicode@unicode.org, www-international@w3.org
Message-ID: <Pine.SUN.3.95q.970308152112.245Y-100000@enoshima>

On Sat, 8 Mar 1997, M.T. Carrasco Benitez wrote:

> > As of the definition in RFC 2070, the exact meaning of <HTML LANG=xxx>
> > is that everything not marked to be in any other language is xxx.
> > This can range from the whole document being in xxx to documents
> > that contain not a single word in xxx. The later case does not
> > make much sense in practical terms, but is perfectly legal
> > according to RFC 2070.
> 
> Yes.  But does it make sense to give some more "semantics" to this
> syntax ?

It looks like you are giving more semantics, but you are actually
changing semantics. Currently, if I write <HTML LANG=en>, this
does not mean that the document is monolingual, or that the
document is more than 50% English, or whatever. Changing semantics
is much more of a problem than adding semantics.

> > A general comment:
> > 
> > As we have seen in this discussion up to now, there are many
> > different needs for language information about documents.
> > 
> > Proposals for one specific interpretation of one already
> > well-defined way to indicate language in a HTML document,
> > to satisfy one specific information need that appeared at
> > one place are not a long-lasting approach to solving the
> > information needs we have.
> > 
> > I would suggest to attack the problem in a wider frame,
> > e.g. to look at Metadata (DC or other) and see how this
> > can be used to satisfy the various needs already expressed
> > and the many more that will appear in the future.
> 
> Does it make sense the approach in the present draft: "Natural language
> marking in HTML" or should we approach it from another angle ?
> 
> I am aware that the proposal is very limited: just a clarification of the 
> existing syntax and some additonal "semantics" and even so one can see
> the hard work for consensus. I am concern that a more "revolutionary"
> approach would not work.

The problem with consensus comes mainly from the fact that the
proposal is very limited, and that many people don't see much
of a benefit in it.

Basically, as far as I understand you, you want some mechanism
so that documents self-containedly identify themselves as
monolingual documents (if they are monolingual), in a single
form and in a form that can easily be accessed by servers
and used by browsers. Some technicalities of how that could
be done in an uniform way have been discussed. But it is not
exactly clear to me why exactly monolingual documents would
need some specific identification (I can imagine there are
applications where this could be used), or why this should
be so much more important than other identification needs
that we have to treat it with the special attention it has
received recently.

Also, in a more "revolutionary" approach, the advantage is
that you don't have to interfere with existing semantics,
and so it's easier to find a solution that is widely acceptable.

Regards,	Martin.

Received on Saturday, 8 March 1997 09:33:55 UTC