RE: References and Modularity from Larry Masinter on 2013-06-12 (www-tag@w3.org from June 2013)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 12 Jun 2013 14:31:09 -0700
To: Marcos Caceres <w3c@marcosc.com>
CC: Jonathan A Rees <rees@mumble.net>, Anne van Kesteren <annevk@annevk.nl>, "L. David Baron" <dbaron@dbaron.org>, Robin Berjon <robin@w3.org>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D3471E40CCF@nambxv01a.corp.adobe.com>
Sorry this is long. I'd rather have a conversation about this.

If A --normative reference -> B and A and B are both 'living standards' being
maintained by the same organization, then it's fine, an undated
reference to B is no problem, because presumably updates to B are
verified to not cause problems with the reference from A. 

So that's fine.

However, when the groups working on A and B aren't necessarily
coordinating, every update to B can potentially change the 
conformance requirements to implement A, but the watchers
of A may not notice changes to B. 

So I think it's a matter of trust and coordination.


> I agree - editors are actually last in my chain of cares. My interest is that
> implementers, reviewers, and users are always looking at the most stable/up-
> to-date references.

We're trying to optimize the needs of end users first, by paving the
way for those who deploy (install, service, support) the components of
the web (browsers, servers, sites, search engines, data gatherers) and 
who use tools, libraries, and other elements to create those.

As part of that, we're creating a documented agreement of what is
shared in common between all of the various communities of interest.
The documented agreement is of concern to all of the community,
with organizations like W3C and IETF and ECMA providing venues
for discussion and consensus-building through the whole community
by allowing representatives of the various constituencies to speak,
propose alternatives, and to propose and effect changes that are
important for the overall functioning.


Reviewers are like your QE department -- if they're going to test your software,
you need to make sure you don't introduce new bugs when you try to fix
old bugs, which requires a controlled "fork" where bug-fixes get made
in the stable fork and new features get added to the "up to date" spec.

I'm having trouble understanding why this is controversial.

But to stabilize a specification, you have to stabilize all of the references in it.

I have no problem with an "Editor's Draft" having references that update,
fine. It's an Editor's Draft. But when you get down to "Last Call", all of the referenced
material should have some clear identification of which edition is to be reviewed --
if only for making sure that the references are up to date enough.

> To this point, it's why I've been nagging the W3C to make the /TR/ information
> available in JSON and why I pushed for all specifications to be CORS enabled
> (they now are!) - so we can dynamically scrape the metadata of specifications
> to build up references. 

Automatically building references might be OK, but how do you
automatically add notices when normative references have changed
since the last time you looked. The automation people are doing
now with undated references skips the step of verifying the applicability
of changes.

> I also nagged the IETF about CORS-enabling their specs
> list (they have a big XML file or something), but I never heard back from them.

nagging people to do something without a clear rationale is rarely
productive.


> So: what I am trying to do at the moment is remove human error from
> references - humans do a bad job at keeping information up to date, and this
> punishes reviewers and users. So the best way to deal with the human
> problems is to limit the amount of things the humans (including me) can get
> wrong.

Deciding whether changes are needed when the reference is updated
requires human intervention. Automating the process of inserting the
right metadata for a specific version is part of xml2rfc, and the database
should be shared with W3C and respec, but automatically generated
references should be to specific versions.

 
> > Secondly you are doing this optimization for the editor at the expense of losing
> > cruicial information for the review process, and one of the essential tasks for
> which
> > standards groups have editors: to insure the integrity of the references.


> In my model, the review process is continuous and forever ongoing  hence the
> references have to be tied to the latest and greatest (and the Editor/WG
> deeply invested in monitoring changes to all specs they reference).

The editor edits. The editor is not the sole designer or the sole reviewer,
of any normative requirement. You review a spec for both whether the
thing specified works (can be implemented) and whether it is well described
(can be implemented from the spec).  

There's a standards process in various organizations; your model is incomplete.
The "latest" is not always the "greatest". Deciding that requires judgment
and consensus.


>  Anything
> else is demonstratively fundamentally flawed: If once cites a WD version, and
> that WD version gets updated the date after the spec is published, the
> reference is useless. A reviewer/user can go and look at the now out of date
> version, and conclude it makes sense, but in reality the cited WD has changed
> and now the published spec would be wrong.

I think there are many flaws, including in the model you've proposed. I think
there are some process and automation changes that can address the problems,
and that universally switching to undated references breaks many processes --
as many problems as it fixes.


> > You gave as an example:
> >
> > > [ABNF]
> > > Augmented BNF for Syntax Specifications: ABNF, D. Crocker, P. Overell.
> IETF.
> > >
> > > Would be just:
> > >
> > > [ABNF]
> > > Augmented BNF for Syntax Specifications: ABNF. IETF.
> >
> >
> >
> > However, the reality is that [ABNF] could have actually changed in ways that
> > don't match.
> 
> Right, in a living standard model, you just fix the spec that is in error.
> >
> >
> > Look, for example, at the
> > discussion in the IETF JSON working group over updated references to
> [ABNF]
> >
> > thread "Update reference for ABNF"
> > http://www.ietf.org/mail-archive/web/json/current/msg00405.html

> > thread "Proposed change: update the Unicode version"
> 
> 
> That thread concludes with:
> "(Updating is good spec hygiene, and the RFC editor will make you update it
> anyway. Or the IESG before it.)"
> 
> Which underscores my point, no?

Not at all. The updates to ABNF  include some changes with the expectation
that referring specifications would adapt.

> > http://www.ietf.org/mail-archive/web/json/current/msg00301.html

> 
> And this one:
> "There are other i18n issues that we might want to clarify, but updating the
> Unicode reference seems uncontroversial."

Yes, after evaluation. It's not controversial because someone actually looked.

> And even better followup from Tim Bray:
> "So how about `string is a sequence of zero or more Unicode characters
> defined in version 6.2 (or any subsequent version) of [UNICODE]`"
> 
> Yes, sir… "or any subsequent version" … i.e., the latest and greatest :)

Yes, that's what Tim proposed, but I think it is still subject to the
caveat that if the Unicode Consortium decided  to change what
was called a "character", all of the specs that referenced it would
need to be changed when they updated the version.

> Note also the clever discarding of section numbers by Tim. Most impressive.


> > you may think these conversations are unnecessary, extra work and burden,
> > but they're part of creating one of the things expected of a "standard" --
> 
> I think they are great, because they seem to completely validate what I am
> saying.

If at time T I make a reference from A to B, I should say which version 
of B I intended, as WELL as giving, if possible, my estimate of where I think
 it is likely to be able to find the latest version of B. But unless I control
B or have confidence that B won't change in ways that mess up my spec,
giving an undated reference to B is just reckless.


> > namely that it has been widely reviewed for consistency. And you can
> > only insure wide review for consistency if there are no unreviewed
> > changes during a 'last call' period.
> 
> You can review for consistency anytime/anywhere. Specs change specially
> _after_ Last Call. CR is when specs are at their most volatile.

I think if there are substantial changes to a document during Last Call,
then Last Call needs to be repeated. 

 >
> > Whenever A --normative reference -> B and B updates from B1 to B2,
> > that you actually need to REVIEW whether the changes from B1 to B2
> > require changes to the language in A.
> 
> This is true. In my model, I assume the Editor will continue to track all specs he
> or she references. This is a fundamental part of maintaining a Living Standard.

The Editor isn't the only reviewer. I don't think any single person
can track all of the specs of the open web platform. 


> >
> > Your proposal to remove explicit dates and point to the 'latest version'
> > not only removes the opportunity to do this review, it eliminates the
> > important information of whether the updated reference has
> > been checked for consistency with your use of the previous referenced
> > spec.
> 
> Kinda, but it's assumed to be part of the Living Document process - and if found
> to be an issue, one fixes either spec to resolve the issues.
> > The other metadata (title, author/editors, organization publishing) are,
> > in addition to the date, important clues -- when you chase a
> > URL in A and get 404 Not Found -- as to how you might search for
> > the intended specification anywhere, e.g., if it moved.

> I bet you can find it without those bits (i.e., only with title and URL). I would like
> to see at least one example of where a reference in a W3C spec has gone 404
> and you can't find it again with the title and the URL through Google or the web
> archive. It seems like cargo cult behavior (copying from the dead tree academic
> journal model) for no demonstrable reason. I'm open to being proved wrong

I think calling it "cargo cult" is inappropriate. With a cargo cult, the system
being followed never actually worked, it's blind following of process
based on misunderstanding.

The system of using dated references is embedded in our society even
in the migration to digital media. Academic references to supporting evidence
is required by all online resources.  WIkipedia requires author, date, title
http://en.wikipedia.org/wiki/Wikipedia:Citing_sources -- they're not
"cargo cult", are they?

I think I've tried to justify the "demonstrable reason" for having dated
references as a way of indicating clearly which version of the documented
cited was reviewed widely -- for patents, implementability, performance,
security.  I think I've given two use cases and can give more of situations
where the update to the cited document required some changes to the
citing document before the update was correct.

> > Finally (as a minor point): specifications that don't change substantially
> > sometimes get reorganized for clarity, but the reorganization plays
> > havoc with references from other specs into the interior. So if you
> > say "Section 7.2 of [UNICODE]" but Unicode gets reorganized,
> > the new section might have a different section number.
> 
> Totally, that's another rule of mine: never ever cite section numbers - only
> concepts or complete specifications (or tell the editor of the other spec to add
> suitable id's to their document so you can hyperlink to them).

If you can get a clear agreement from the editor of a cited work
to keep the integrity of the citation, that might work, to allow
undated citations. But if there is no such agreement, or even if there
is but it's not documented in B that A depends on B and that updates
to B must coordinate with A -- you still have problems that are
easily avoided by being more extensive in how you list citations.

Larry
Received on Wednesday, 12 June 2013 21:36:30 UTC