RE: Versioning and HTML -- version indicators

Marc,

I think I covered my opinions on these questions in my blog posting at 
[1], which I've quoted often and don't want to rehash again.  Trying to be 
responsive in brief to your particular concerns:

> I believe the basic notion should not so much be what version 
> the author had in mind, but what the author expects or requires
> from the reader.

I don't understand.  Any reader will interpret a document per some version 
of the specification (we hope).  Only if the language may evolve 
incompatibly, then a version indicator may be needed to identify one (or 
if you like more) versions of the specification that lead to a correct 
interpretation, I.e. what the author intended.  As Jonathan said, in other 
cases, an explicit version indicator is at best redundant, or giving extra 
information about which specifications the author happened to have in 
hand.

> This might not be a single version indicator, but a collection 
> of required or expected capabilities, or nothing much at all.

As discussed in the blog, this leads to complexities.  Let's say 5 
versions of the spec are out, and the author knows the document to be 
compatible with the latest 3.  Does she list all 3?  Did that mean that to 
write a simple document, she had to read 5 versions of the spec, decide 
which ones covered the features used?  If a 6th version comes out does she 
have to go back and update?

If you answer "yes", it's hugely inconvenient for the author;  if you 
answer "no", then the same problem transfers to the reader.  Let's say the 
reader is coded to version 6, and the explicit indicator is 4.  How does 
this help?  In fact, an omniscient observer would know that the document 
can safely be read, but nothing in the document tells you that.  The 
reader has to know all 6 specs, and know which versions he can handle.

> More explicit: in the case of publication formats, understanding the 
MIME
> type and having something at all which can try to process it 
> may sometimes
> do. 

I think you're talking about some sort of relaxed mode in which a certain 
amount of undetected misinterpretation of the document is OK.  I am not 
talking about that case.  I'm assuming that we want the reader to make a 
correct (though perhaps knowingly incomplete) inference from the documents 
he receives, or else to reliably detect that the document cannot be safely 
interpreted.

Noah

[1] http://www.w3.org/QA/2007/12/version_identifiers_reconsider.html

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








"Marc de Graauw" <marc@marcdegraauw.com>
05/16/2009 05:46 AM
 
        To:     <noah_mendelsohn@us.ibm.com>, "'Larry Masinter'" 
<masinter@adobe.com>
        cc:     <www-tag@w3.org>
        Subject:        RE: Versioning and HTML -- version indicators


Noah,

| In particular, I still tend to believe that:
| 
| "If the same document means different things in different 
| versions of a 
| language, then it's very important to indicate which version 
| the author 
| had in mind when creating the document. Putting that version 
| indicator 
| into the document itself is one good way to do it."

I believe the basic notion should not so much be what version the author 
had
in mind, but what the author expects or requires from the reader. This 
might
not be a single version indicator, but a collection of required or 
expected
capabilities, or nothing much at all.

More explicit: in the case of publication formats, understanding the MIME
type and having something at all which can try to process it may sometimes
do. If I publish something on the web, for reading sometimes a 'best 
effort'
will do: render what you understand, ignore the rest. (I'm no expert on
HTML, and this is not meant as a position on what versioning information
should be included in HTML.)

For healthcare, or other messaging systems, this is not the case: as a
receiver, I expect you to understand an EHR XML message about me, and the
medication code system used - if you do not understand either, you should
not try to process or read the message: you might be messing with my life. 


As the medication code sample shows, multiple indicators may be needed. 
For
the base version of the language, maybe for code systems, extensions,
localizations etc. For a v2 things may become different. An EHR v2 might
just contain allergy information as an optional add-on. If this is the 
case,
for messages without the optional allergy-related parts, understanding the
base version + medication codes still will do. (I wrote an article with an
extended example a while ago [1]).

You've explained this in your blog item [2] with the example of optional
pictures in v2 quite clear. But in the discussion there you only ask 
whether
v2 docs without pictures should be marked v1 or v2: you don't consider a 
'v1
+ pics' versus a 'v1' option: two markers, one optional, which reflects 
the
language evolution.

In short, I believe having a single version indicator does not cover more
complicated cases. I also don't believe the auther's intentions matter 
much,
what matters is what the author expects of the reader. This is very much
context-dependent, in publications for the world this might not be much
aside from some basic reading capability; in other contexts this may be 
very
constrained. This also reflects on the effort the author is willing to do 
to
provide detailed versioning information: as you write in the blog, "I 
don't
want to have to go through the specifications for every version of the
recipe language that's ever existed just to find the oldest that works".
True in most cases, not in some. The options the author has are basically:
- provide no specific versioning information at all, other than MIME type,
and leave the rest up to the reader
- provide the version of the language spec used to write the doc (the
laziest option after the previous)
- provide more detailed information on the specific capabilities required 
or
expected of the receiver.

There's no generic 'best option' here. You proposed in the blog: "If a
language or data format will change in incompatible ways, then indicate 
the
language version used for each instance." I believe that's too strong for
all cases. Quite often for the author it's enough if the receiver's 
software
will fail after an incompatible change. In source code, having a version
indicator is uncommon and it does not matter much: the compiler will 
report
incompatible code. In other cases, it's not strong enough since more than 
a
single 'language version' may need to be communicated. Note that the
distinction between a language, sublanguages, incorporated languages,
extensions, localizations and such is blurred - some definition on what
constitutes a language may needed. On the other hand, if one uses multiple
versioning markers, and the burden placed on the receiver is clear, it may
not matter much whether those multiple markers pertain to one or several
languages.

| I would actually propose that, with respect to explicit 
| version indicators 
| in particular, we take the points in the blog entry as a 
| starting point, 
| and either publicize them, elaborate them, or where necessary correct 
| them.  To me, they look right as far as they go.

I don't know whether this has any relevance to HTML at all, but since this
discussion is also in the context of the TAG Versioning Finding, I hope 
you
view this as an effort to elaborate them. I think specifically the case of
multiple markers with versioning-related information should be covered.

Regards,

Marc de Graauw

http://www.marcdegraauw.com

[1] 
http://www.xml.com/pub/a/2007/04/11/a-smoother-change-to-version-20.html
[2] http://www.w3.org/QA/2007/12/version_identifiers_reconsider.html

Received on Tuesday, 26 May 2009 14:36:29 UTC