Semantics is not semantics (was: HTML WG Glossary) from Preston L. Bannister on 2007-04-08 (public-html@w3.org from April 2007)

From: Preston L. Bannister <preston@bannister.us>
Date: Sun, 8 Apr 2007 12:11:51 -0700
To: "HTML WG Public List" <public-html@w3.org>
Cc: "Doug Jones" <doug_b_jones@mac.com>
Message-ID: <7e91ba7e0704081211hcb0b623idf9b3502eda5adbc@mail.gmail.com>

Reading what I have seen written on the "semantic web" has always left me
with the feeling that something did not quite make sense, but could not put
my finger on the cause.  Between the chatter on this list, and the below
referenced definition, the source of my unease finally came clear...

On 4/8/07, Doug Jones <doug_b_jones@mac.com> wrote:
>
> [snip]*semantics*: The branch of linguistics and logic concerned with
> meaning.[1]
>
>    - Elements, attributes, and attribute values in HTML are defined (by
>    this specification) to have certain meanings (semantics).[2]
>    http://www.whatwg.org/specs/web-apps/current-work/#semantics0
>
>
The problem is that the above use of the term "semantics" blurs together the
human and browser domains.  What you might call "semantics" to a web browser
is NOT the same as "semantics" to a human.

In human writing (linguistics) the levels of abstraction are roughly:

   - Lexemes <http://en.wikipedia.org/wiki/Lexeme> - sequences of letters
   that make up words.
   - Syntax <http://en.wikipedia.org/wiki/Syntax> - sequences of words
   that make phrases.
   - Semantics <http://en.wikipedia.org/wiki/Semantics> - the "meaning"
   derived from sequences of phrases.
   - Pragmatics <http://en.wikipedia.org/wiki/Pragmatics> - the effect of
   context on "meaning".

The web browser sees an HTML document in a somewhat similar fashion:

   - A lexical scanner converts sequences of characters into tokens.
   - A parser converts sequences of tokens into a DOM tree.
   - An interpreter converts a DOM tree into a visual representation and
   set of behaviors.
   - The interpretation is affected environmental factors (display size,
   installed fonts, preferred language, etc.).

What counts as semantics to the web browser is quite different from what
counts as semantics to a human.  Browsers are essentially ignorant of
human-level semantics.  (Putting machine understandable human-level
semantics into the web is an interesting goal, but - given that no one has
yet shipped a HAL 9000 <http://en.wikipedia.org/wiki/HAL_9000> - very
hard.)

Perhaps some of the writers about the "semantic web" meant browser and not
human semantics, but I would bet this distinction is far from universally
understood.  Given that a web document is looked by a web browser (the
means), and by a human (the end) - using the terms "semantics" in reference
to web documents, but not relative to the end goal ... seems at least
dubious.  Better to use another another term - perhaps "well structured"?

(There is something amusing about fuzzy semantics applied to the use of the
word "semantic".  Oh well.)

Received on Sunday, 8 April 2007 19:12:00 UTC