A Proposal for Alignment with HTACG

Crossposted to
 [1]: https://lists.w3.org/Archives/Public/public-htacg/
 [2]: https://sourceforge.net/p/tidy/mailman/tidy-develop
 [3]: https://lists.w3.org/Archives/Public/html-tidy/

This message is addressed to HTML Tidy users, developers, maintainers, and
other
interested parties in an effort to spur discussion regarding the present and
future of HTML Tidy, including a proposal for the continued maintenance and
development of HTML Tidy.

Simply put, my proposal is that responsibility for the current SourceForge
repository be turned over to HTACG.

The preceding simple statement necessarily involves a large amount of
discussion. This is a big discussion with a lot of text, and some of it will
surely please each of you, and some it will certainly infuriate some of
you. I
hope that the "big picture" of what I'm presenting will encourage you to
support
the HTACG project and the opportunities it offers.

(I apologize for the Markdown like format, but it's very legible and
minimizes
the risk of reference mistakes.)


## What is HTACG

On 2015-January-15 I created the HTML Tidy Advocacy Community Group
([HTACG][4]), a [W3C Community Group][5], of which I am currently serving as
Chair. It "is dedicated to the continued support, development, and
evolution of
the HTML Tidy command line application and library."

More specifically, it "aims to become the canonical release group for HTML
Tidy,
which has been without a stable, public release since 2008. The Community
aspires to achieve the agreement and support of the original and current
developers to this end."

Certainly the above goals cannot be achieved without the cooperation of the
subscribers to this list.

(The above quotes are from our [official description][5]. Although the
current
SourceForge repository is regarded as stable by the developers, the
_intention_
of the statement is meant to indicate that there have been no _newer_
releases
or bug fixes).

Although HTACG is affiliated with the W3C, it's important to note that W3C
does
not provide direction over HTAGC. The community group belongs to the
community.

For additional information please see our [HTACG Project Charter][6].


## Meaning of "turned over to HTACG"

The simple proposal "responsibility for the current SourceForge repository
be
turned over to HTACG" means that the current maintainers grant access to the
repository to individuals as specified by HTACG. Certainly the current
maintainers are encouraged to affiliate with [HTACG][5] and take part in
this
decision process.

The result, publically, is HTML Tidy becoming a community driven, community
led
project. It's even possible that the current maintainers dominate HTACG, and
should this happen then at least:

 - it's a community decision
 - it happens under the auspices of a public-facing organization rather than
   individuals.

Although the decision process for granting access has yet to be
[formally defined][6] it's a high priority for HTACG. In general HTACG
members
will reach consensus based on public discussion. This discussion should
consider
past and present contributions to HTACG and the HTML Tidy project. Strong
regard
should be given to the input of the current Chair or Chairs.


## HTACG Leadership and Succession

As mentioned above I am current Chair. This was done for the sake of
expediency
in kicking off HTACG. I do not imagine myself to be the "owner" of HTACG,
and
the position of Chair is always available to other HTACG members via the
[Community Group Page][5].

The community should expect and desire turnover in the position of Chair. As
such another work in progress is a formal [succession document][6], which
will
make provisions for turning over access to repository membership/ownership,
domain names, and other assets of HTACG.

A stable organization should be able to tolerate 100% turnover while
remaining
functional.


## Current State of Tidy

HTACG was formed specifically to fill the need of an interested steward for
HTML Tidy. There have been no bug fixes or improvements to the SourceForge
repository in several years and issues go unresolved. Popular operating
systems
ship with `tidy` that's not capable of working with HTML5, and popular
software
repositories ship with less than capable versions of `tidy`, too.

Additionally a prominent fork of HTML Tidy hosted by W3C featuring support
for
HTML5 had grown stagnant, too, with no commits or addressing of issues for
some
years.

In many corners of the Internet there are claims that "Tidy is dead," or
"Tidy
is outdated," or "Tidy isn't maintained." These are fair assessments and
HTACG
hopes to change both the facts and the perception.

HTACG has successfully [taken responsibility][7] for this aforementioned
prominent W3C fork. Due to a _perceived_ endorsement from [Dave Ragett][8]
HTACG
had understood that this fork was the approved, natural successor of the
SourceForge project, and has taken steps with this thought in mind.

Due to incomplete knowledge of some details of HTML Tidy's history we were
unaware of a fracture between the W3C fork and the current SourceForge
home. I
sincerely hope that our actions are seen as a sign of motivation and
enthusiasm
towards HTML Tidy rather than any attempt to usurp the current project.
Indeed
the future depends on current project.


## Why not fork?

Open source encourages forking, and there are successful forks of many
popular
pieces of software. MariaDB (né MySQL) is a good example of this. Both
MariaDB
and MySQL have large installed user bases and a large developer community.
Smaller projects, such as HTML Tidy, aren't as successful at this.

Although HTML Tidy is pervasive, the current developer community is small
and
due to lack of maintenance has fractured into scores of personal, private
forks.
A lot of these forkers have made improvements (most good, some bad) with
high
value for sharing, but without a leader — a known group or organization —
these
changes offer value to no one.

Tidy's past reputation is the best reason not to fork. HTACG intends to see
_Tidy_ thrive, not some offshoot that lacks its history. As distasteful as
the
word "branding" is to many of us, Tidy is a brand, and it's a brand that
shouldn't be tarnished by withering away and dying.


## HTACG Actions to Date

To date HTACG has achieved the following:

 - Formed on 2015-January-15 ([initial announcement][10]).
 - Assumed control of the W3C fork. (Yes, we now better understand some of
the
   circumstances behind the origin of this fork, and are striving to undo
the
   damage that resulted).
 - Have setup a draft Project Charter.
 - Have setup the framework for a self-running, community workgroup (WIP).
 - Have reached out with our desire to work with the original maintainers
and to
   ask them (you) to support and join our cause.
 - Have closed all but one current pull request in our working branch.
 - Have closed approximately 30 issues in our working branch.
 - Have moved to a modern semantic versioning system.
 - Have begun a new branding initiative.
 - Have promoted the HTML5 capabilities added by Björn.
 - Have put together an HTACG [filler website][4].
 - Have made steps towards a proper [HTML tidy website][12].


## HTACG Tentative Plans

The several subsections below provide high-level details of what HTACG
proposes
to do. Our goal is to be community-driven, so some or many of these are
likely
to change based on what we collectively decide.

### Branding

"Branding" sounds like MBA nonsense in some people's ears, but branding and
positioning a project are important in order to attract new members to the
team
and attract the interest of new developers. Tidy's early reputation was
largely
gained through network effects, and while it's possible to leverage a
network
effect in the future, Tidy requires a relaunch, and a relaunch requires some
branding.

 - Tidy itself is a brand. It has significant name recognition and is
regarded
   as the defacto HTML cleaning tool by a significant userbase even today.

 - W3C is a brand. HTACG's affiliation with W3C as a Community group lends
   significant credibility to the project without any of the dangers in the
   past. We are now completely aware of the on again, off again relationship
   with W3C. As a Community Group there is no danger of that happening
again, as
   the primary affiliation is HTACG. HTACG can exist without the W3C if the
   community decides such.

 - HTACG itself is capable of becoming a brand. "Who writes Tidy these
days?"

 - Modernized websites and graphics. If we don't want to be perceived as an
   artifict from 2002, we can't present the image of an artifact from 2002.
   Certainly this is superficial, but the population at large is superficial
   and we can't ignore image these days. It's no longer good enough to say,
   "If what we provide is good, then people will come."

 - Modernized communications channels. Similar to the above, there's a large
   element of the population that expects to subscribe to a Twitter feed.

In short, a project that _looks_ alive will attract the attention and
support
that Tidy needs in order to _stay_ alive.


### Community Resources


#### Repositories

The current, true HTML Tidy is currently hosted at [SourceForge][9], while
the
branch inherited by HTACG from the W3C is working out of [GitHub][7].

While CVS and git both have their advantages and disadvantages, I propose
that
in the interest of community development, combined with responsible
maintainers,
we adopt Github as the official working repository.

If desired we should consider maintaining a mirror of the respository on
SourceForge. Although this subjects us to additional administrative burden,
HTML Tidy has a long history on SourceForge and for many users it is still
the
go-to destination for anything Tidy-related.

A mirror also affords an opportunity for the original maintainers to
separate
from HTACG if they should determine that they are not satisfied with the
progress that HTACG is promising.


#### Issues Trackers

With the assumption that we work from Github, we should close the issues
tracker
at SourceForge after migrating the issues to Github.


#### Websites

We should combine the existing websites. I have procured the domains
htacg.org
and html-tidy.org, and they can be pointed to any arbitrary host. (Please
note
that these domains will be surrendered to an appropriate, proper person in
line
with our work-in-progress [succession plan][6].)

In consideration for the "branding" issues already described, the cohesive,
single website will be in need of an upgrade.

My proposal includes using Github hosting for these websites. Just as for
software projects, this provides the ability for HTACG members and the
general
public to issue pull requests and post issues.


#### Mailing Lists

Github does not offer mailing list support. This still leaves us with three
main mailing systems to support ([W3 HTACG][1], [SourceForge][2], and
[W3 Tidy][3]), which will be burdensome to monitor and support.

I will make the suggestion that we move to the set of HTACG mailing lists.

 - As my suggestion is to move towards Github and adding distance from
   SourceForge, it is natural not to favor SourceForge's mailing list.

 - The orginal W3 mailing list has a long history, however in that some
members
   have expressed disappointment in W3C's previous behaviors, perhaps it is
   good to distance ourselves.

 - The HTACG list is _also_ hosted at W3C, however we have more control
over it,
   and it provides relevancy to HTACG as an organization.

Clearly we as members must be prepared to monitor all of the existing
mailing
lists during a transition period.


### Transparency and Working Documents

While debate about specific issues and implementations is suitable for issue
tracker threads, broader discussion towards strategy, leadership, working
documents, standards, etc. should be relegated to the appropriate public
mailing
list which provides HTACG members and non-members the ability to provide
feedback.

HTACG currently supports a set of working documents — many of which are
generously called "work in progress" — in our [community respository][6]. As
a github repository these very same working documents are subject to
community
comment and modification via pull requests.

It is HTACG's intention (abusing the oft-repeated ISO phrase) "to say what
we
do and do what we say."

Current (generously-called) works-in-progress include:

 - Project Charter (the high level principles for HTACG)
 - Contributor agreement (so we aren't burdened by proprietary licenses)
 - Chair succession plan (so no one person can hold HTACG hostage)
 - Guidelines for providing commit access (whom do we trust?)
 - Guidelines for design criteria (code style, compiler specifications,
etc.)
 - Guidelines for release criteria (when do we roll to "master"?)
 - Guidelines and instructions for regression testing.
 - Policy for accepting pull requests (for contributors and maintainers).
 - Roadmap, including a description of Tidy's versioning (where do we go?)


### Relaunch Branch

A lot of development has been based on the branch derived from Björn
Höhrmann's
original patch for HTML5 and then taken by W3C. Although there may be some
design decisions that the current maintainers disagree with, the code is
much
more updated and several important contributions have been added based upon
Björn's work.

Therefore I suggest:

 - We start with the current HTACG develop-500 branch.

 - We run regression tests for all of the < HTML5 test cases. Successful
   tests (or bug fixes) should satisfy everyone that HTACG Tidy is nominally
   at the same level as SourceForge Tidy.

 - All HTACG members are requested to review the code and test cases for the
   new HTML5 functionality, and issues can be posted to the issue tracker if
   they are technical in nature, or posted to the mailing list if they are
more
   strategic or fundamental in nature.


### Revision Control History

Contributor history is an important aspect of FOSS software development, and
every effort to recognize contributors should be made.

Github offers an automatic version control history that records the
individual
who made a push, who accepted a pull request, and who originated a pull
request.

The current development branch at Github did not adequately record the
commit
history when it was first forked from SourceForge. However due to the
nature of
git, it seems that it might be possible to pull the SourceForge source while
maintaining its history, and then merge the current branch atop it while
maintaining the entire release history.


### Tidy History

The purpose of HTACG is, among other things, to keep HTML Tidy alive and
well,
and that includes honoring its past. HTACG will ensure that all previous
contributors, maintainers, and participants are prominently recognized on
its
websites using material sourced from SourceForge and Dave Ragett's W3C page.


## Summary

As you can see, in the 22 days since establishing HTACG, a lot of thought
and
effort have been put into promoting and maintaining HTML Tidy. While it's
true
that there is still a lot of work to be done, the framework for good
governance
and stewardship has been put into place.

I hope that subscribers to this list can recognize that Tidy needs help in
order
to remain relevant, and can grant support for this proposal or a modified
form
of this proposal.

Thank you for the significant amount of time you have invested in reading
this.


* * *

 References:
  [4]: http://www.htacg.org/
  [5]: http://www.w3.org/community/htacg/
  [6]: https://github.com/htacg/community/tree/master
  [7]: https://github.com/htacg/tidy-html5
  [8]: http://www.w3.org/People/Raggett/tidy/
  [9]: http://tidy.sourceforge.net
  [10]: https://github.com/htacg/tidy-html5/issues/137
  [11]: http://www.html-tidy.org/


-- 
---
Jim Derry
Clinton Township, MI, USA
Nanjing, Jiangsu, China PRC

Received on Tuesday, 3 February 2015 07:49:14 UTC