W3C home > Mailing lists > Public > www-international@w3.org > January to March 1997

Re: Natural language marking in HTML

From: M.T. Carrasco Benitez <carrasco@innet.lu>
Date: Sun, 9 Mar 1997 11:48:12 +0100 (MET)
To: "Martin J. Duerst" <mduerst@ifi.unizh.ch>
cc: lee@sq.com, unicode@unicode.org, www-international@w3.org
Message-ID: <Pine.LNX.3.95.970308183000.28949B-100000@localhost>
> It looks like you are giving more semantics, but you are actually
> changing semantics. Currently, if I write <HTML LANG=en>, this
> does not mean that the document is monolingual, or that the
> document is more than 50% English, or whatever. Changing semantics
> is much more of a problem than adding semantics.

Yes, I am trying to specify the semantics: a doc marked <HTML LANG=en>
should be mostly in English.  These kind of specifications are needed:
most of the documents are and will be monolingual.  Many types of
horizontal applications depend on this.

This give a lots of freedom to marks documents:

 - Monlingual doc                         : <HTML LANG=xx>
 - Multilingual doc                       : <HTML>
 - Multilingual doc with a basic language : <HTML> <BODY LANG=xx>
 - Do not want to do anything doc         : <HTML> <BODY>

At present there are no "legacy data" regarding language marking and it is
a good opportunity to specify certain guide lines.

> > > A general comment:
> > > 
> > > As we have seen in this discussion up to now, there are many
> > > different needs for language information about documents.
> > > 
> > > Proposals for one specific interpretation of one already
> > > well-defined way to indicate language in a HTML document,
> > > to satisfy one specific information need that appeared at
> > > one place are not a long-lasting approach to solving the
> > > information needs we have.
> > > 
> > > I would suggest to attack the problem in a wider frame,
> > > e.g. to look at Metadata (DC or other) and see how this
> > > can be used to satisfy the various needs already expressed
> > > and the many more that will appear in the future.
> > 
> > Does it make sense the approach in the present draft: "Natural language
> > marking in HTML" or should we approach it from another angle ?
> > 
> > I am aware that the proposal is very limited: just a clarification of the 
> > existing syntax and some additonal "semantics" and even so one can see
> > the hard work for consensus. I am concern that a more "revolutionary"
> > approach would not work.
> The problem with consensus comes mainly from the fact that the
> proposal is very limited, and that many people don't see much
> of a benefit in it.

The benefit are very real.

> Basically, as far as I understand you, you want some mechanism
> so that documents self-containedly identify themselves as
> monolingual documents (if they are monolingual), in a single
> form and in a form that can easily be accessed by servers
> and used by browsers. Some technicalities of how that could
> be done in an uniform way have been discussed. But it is not
> exactly clear to me why exactly monolingual documents would
> need some specific identification (I can imagine there are
> applications where this could be used), or why this should
> be so much more important than other identification needs
> that we have to treat it with the special attention it has
> received recently.

The recommendation is both for monolingual and multilingual docs.  The
identification of monolingual doc is a basic horizontal need.

I do not compare it with other identification needs.  If this can be
incorporated into a wider scheme, it is fine with me.

> Also, in a more "revolutionary" approach, the advantage is
> that you don't have to interfere with existing semantics,
> and so it's easier to find a solution that is widely acceptable.

I had the feeling that I was not interfering existing semantics, but
rather concretizing the semantics.  There are no nuisance as there a few
docs with language markings.

Received on Sunday, 9 March 1997 06:00:34 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:40:40 UTC