- From: Jim Derry <balthisar@gmail.com>
- Date: Tue, 3 Feb 2015 15:48:43 +0800
- To: html-tidy@w3.org
- Message-ID: <CABUm+BeXis4YH-T933CVC6DyTSNfYU7=BAJcGNYOYNjJ2Xm_pA@mail.gmail.com>
Crossposted to [1]: https://lists.w3.org/Archives/Public/public-htacg/ [2]: https://sourceforge.net/p/tidy/mailman/tidy-develop [3]: https://lists.w3.org/Archives/Public/html-tidy/ This message is addressed to HTML Tidy users, developers, maintainers, and other interested parties in an effort to spur discussion regarding the present and future of HTML Tidy, including a proposal for the continued maintenance and development of HTML Tidy. Simply put, my proposal is that responsibility for the current SourceForge repository be turned over to HTACG. The preceding simple statement necessarily involves a large amount of discussion. This is a big discussion with a lot of text, and some of it will surely please each of you, and some it will certainly infuriate some of you. I hope that the "big picture" of what I'm presenting will encourage you to support the HTACG project and the opportunities it offers. (I apologize for the Markdown like format, but it's very legible and minimizes the risk of reference mistakes.) ## What is HTACG On 2015-January-15 I created the HTML Tidy Advocacy Community Group ([HTACG][4]), a [W3C Community Group][5], of which I am currently serving as Chair. It "is dedicated to the continued support, development, and evolution of the HTML Tidy command line application and library." More specifically, it "aims to become the canonical release group for HTML Tidy, which has been without a stable, public release since 2008. The Community aspires to achieve the agreement and support of the original and current developers to this end." Certainly the above goals cannot be achieved without the cooperation of the subscribers to this list. (The above quotes are from our [official description][5]. Although the current SourceForge repository is regarded as stable by the developers, the _intention_ of the statement is meant to indicate that there have been no _newer_ releases or bug fixes). Although HTACG is affiliated with the W3C, it's important to note that W3C does not provide direction over HTAGC. The community group belongs to the community. For additional information please see our [HTACG Project Charter][6]. ## Meaning of "turned over to HTACG" The simple proposal "responsibility for the current SourceForge repository be turned over to HTACG" means that the current maintainers grant access to the repository to individuals as specified by HTACG. Certainly the current maintainers are encouraged to affiliate with [HTACG][5] and take part in this decision process. The result, publically, is HTML Tidy becoming a community driven, community led project. It's even possible that the current maintainers dominate HTACG, and should this happen then at least: - it's a community decision - it happens under the auspices of a public-facing organization rather than individuals. Although the decision process for granting access has yet to be [formally defined][6] it's a high priority for HTACG. In general HTACG members will reach consensus based on public discussion. This discussion should consider past and present contributions to HTACG and the HTML Tidy project. Strong regard should be given to the input of the current Chair or Chairs. ## HTACG Leadership and Succession As mentioned above I am current Chair. This was done for the sake of expediency in kicking off HTACG. I do not imagine myself to be the "owner" of HTACG, and the position of Chair is always available to other HTACG members via the [Community Group Page][5]. The community should expect and desire turnover in the position of Chair. As such another work in progress is a formal [succession document][6], which will make provisions for turning over access to repository membership/ownership, domain names, and other assets of HTACG. A stable organization should be able to tolerate 100% turnover while remaining functional. ## Current State of Tidy HTACG was formed specifically to fill the need of an interested steward for HTML Tidy. There have been no bug fixes or improvements to the SourceForge repository in several years and issues go unresolved. Popular operating systems ship with `tidy` that's not capable of working with HTML5, and popular software repositories ship with less than capable versions of `tidy`, too. Additionally a prominent fork of HTML Tidy hosted by W3C featuring support for HTML5 had grown stagnant, too, with no commits or addressing of issues for some years. In many corners of the Internet there are claims that "Tidy is dead," or "Tidy is outdated," or "Tidy isn't maintained." These are fair assessments and HTACG hopes to change both the facts and the perception. HTACG has successfully [taken responsibility][7] for this aforementioned prominent W3C fork. Due to a _perceived_ endorsement from [Dave Ragett][8] HTACG had understood that this fork was the approved, natural successor of the SourceForge project, and has taken steps with this thought in mind. Due to incomplete knowledge of some details of HTML Tidy's history we were unaware of a fracture between the W3C fork and the current SourceForge home. I sincerely hope that our actions are seen as a sign of motivation and enthusiasm towards HTML Tidy rather than any attempt to usurp the current project. Indeed the future depends on current project. ## Why not fork? Open source encourages forking, and there are successful forks of many popular pieces of software. MariaDB (né MySQL) is a good example of this. Both MariaDB and MySQL have large installed user bases and a large developer community. Smaller projects, such as HTML Tidy, aren't as successful at this. Although HTML Tidy is pervasive, the current developer community is small and due to lack of maintenance has fractured into scores of personal, private forks. A lot of these forkers have made improvements (most good, some bad) with high value for sharing, but without a leader — a known group or organization — these changes offer value to no one. Tidy's past reputation is the best reason not to fork. HTACG intends to see _Tidy_ thrive, not some offshoot that lacks its history. As distasteful as the word "branding" is to many of us, Tidy is a brand, and it's a brand that shouldn't be tarnished by withering away and dying. ## HTACG Actions to Date To date HTACG has achieved the following: - Formed on 2015-January-15 ([initial announcement][10]). - Assumed control of the W3C fork. (Yes, we now better understand some of the circumstances behind the origin of this fork, and are striving to undo the damage that resulted). - Have setup a draft Project Charter. - Have setup the framework for a self-running, community workgroup (WIP). - Have reached out with our desire to work with the original maintainers and to ask them (you) to support and join our cause. - Have closed all but one current pull request in our working branch. - Have closed approximately 30 issues in our working branch. - Have moved to a modern semantic versioning system. - Have begun a new branding initiative. - Have promoted the HTML5 capabilities added by Björn. - Have put together an HTACG [filler website][4]. - Have made steps towards a proper [HTML tidy website][12]. ## HTACG Tentative Plans The several subsections below provide high-level details of what HTACG proposes to do. Our goal is to be community-driven, so some or many of these are likely to change based on what we collectively decide. ### Branding "Branding" sounds like MBA nonsense in some people's ears, but branding and positioning a project are important in order to attract new members to the team and attract the interest of new developers. Tidy's early reputation was largely gained through network effects, and while it's possible to leverage a network effect in the future, Tidy requires a relaunch, and a relaunch requires some branding. - Tidy itself is a brand. It has significant name recognition and is regarded as the defacto HTML cleaning tool by a significant userbase even today. - W3C is a brand. HTACG's affiliation with W3C as a Community group lends significant credibility to the project without any of the dangers in the past. We are now completely aware of the on again, off again relationship with W3C. As a Community Group there is no danger of that happening again, as the primary affiliation is HTACG. HTACG can exist without the W3C if the community decides such. - HTACG itself is capable of becoming a brand. "Who writes Tidy these days?" - Modernized websites and graphics. If we don't want to be perceived as an artifict from 2002, we can't present the image of an artifact from 2002. Certainly this is superficial, but the population at large is superficial and we can't ignore image these days. It's no longer good enough to say, "If what we provide is good, then people will come." - Modernized communications channels. Similar to the above, there's a large element of the population that expects to subscribe to a Twitter feed. In short, a project that _looks_ alive will attract the attention and support that Tidy needs in order to _stay_ alive. ### Community Resources #### Repositories The current, true HTML Tidy is currently hosted at [SourceForge][9], while the branch inherited by HTACG from the W3C is working out of [GitHub][7]. While CVS and git both have their advantages and disadvantages, I propose that in the interest of community development, combined with responsible maintainers, we adopt Github as the official working repository. If desired we should consider maintaining a mirror of the respository on SourceForge. Although this subjects us to additional administrative burden, HTML Tidy has a long history on SourceForge and for many users it is still the go-to destination for anything Tidy-related. A mirror also affords an opportunity for the original maintainers to separate from HTACG if they should determine that they are not satisfied with the progress that HTACG is promising. #### Issues Trackers With the assumption that we work from Github, we should close the issues tracker at SourceForge after migrating the issues to Github. #### Websites We should combine the existing websites. I have procured the domains htacg.org and html-tidy.org, and they can be pointed to any arbitrary host. (Please note that these domains will be surrendered to an appropriate, proper person in line with our work-in-progress [succession plan][6].) In consideration for the "branding" issues already described, the cohesive, single website will be in need of an upgrade. My proposal includes using Github hosting for these websites. Just as for software projects, this provides the ability for HTACG members and the general public to issue pull requests and post issues. #### Mailing Lists Github does not offer mailing list support. This still leaves us with three main mailing systems to support ([W3 HTACG][1], [SourceForge][2], and [W3 Tidy][3]), which will be burdensome to monitor and support. I will make the suggestion that we move to the set of HTACG mailing lists. - As my suggestion is to move towards Github and adding distance from SourceForge, it is natural not to favor SourceForge's mailing list. - The orginal W3 mailing list has a long history, however in that some members have expressed disappointment in W3C's previous behaviors, perhaps it is good to distance ourselves. - The HTACG list is _also_ hosted at W3C, however we have more control over it, and it provides relevancy to HTACG as an organization. Clearly we as members must be prepared to monitor all of the existing mailing lists during a transition period. ### Transparency and Working Documents While debate about specific issues and implementations is suitable for issue tracker threads, broader discussion towards strategy, leadership, working documents, standards, etc. should be relegated to the appropriate public mailing list which provides HTACG members and non-members the ability to provide feedback. HTACG currently supports a set of working documents — many of which are generously called "work in progress" — in our [community respository][6]. As a github repository these very same working documents are subject to community comment and modification via pull requests. It is HTACG's intention (abusing the oft-repeated ISO phrase) "to say what we do and do what we say." Current (generously-called) works-in-progress include: - Project Charter (the high level principles for HTACG) - Contributor agreement (so we aren't burdened by proprietary licenses) - Chair succession plan (so no one person can hold HTACG hostage) - Guidelines for providing commit access (whom do we trust?) - Guidelines for design criteria (code style, compiler specifications, etc.) - Guidelines for release criteria (when do we roll to "master"?) - Guidelines and instructions for regression testing. - Policy for accepting pull requests (for contributors and maintainers). - Roadmap, including a description of Tidy's versioning (where do we go?) ### Relaunch Branch A lot of development has been based on the branch derived from Björn Höhrmann's original patch for HTML5 and then taken by W3C. Although there may be some design decisions that the current maintainers disagree with, the code is much more updated and several important contributions have been added based upon Björn's work. Therefore I suggest: - We start with the current HTACG develop-500 branch. - We run regression tests for all of the < HTML5 test cases. Successful tests (or bug fixes) should satisfy everyone that HTACG Tidy is nominally at the same level as SourceForge Tidy. - All HTACG members are requested to review the code and test cases for the new HTML5 functionality, and issues can be posted to the issue tracker if they are technical in nature, or posted to the mailing list if they are more strategic or fundamental in nature. ### Revision Control History Contributor history is an important aspect of FOSS software development, and every effort to recognize contributors should be made. Github offers an automatic version control history that records the individual who made a push, who accepted a pull request, and who originated a pull request. The current development branch at Github did not adequately record the commit history when it was first forked from SourceForge. However due to the nature of git, it seems that it might be possible to pull the SourceForge source while maintaining its history, and then merge the current branch atop it while maintaining the entire release history. ### Tidy History The purpose of HTACG is, among other things, to keep HTML Tidy alive and well, and that includes honoring its past. HTACG will ensure that all previous contributors, maintainers, and participants are prominently recognized on its websites using material sourced from SourceForge and Dave Ragett's W3C page. ## Summary As you can see, in the 22 days since establishing HTACG, a lot of thought and effort have been put into promoting and maintaining HTML Tidy. While it's true that there is still a lot of work to be done, the framework for good governance and stewardship has been put into place. I hope that subscribers to this list can recognize that Tidy needs help in order to remain relevant, and can grant support for this proposal or a modified form of this proposal. Thank you for the significant amount of time you have invested in reading this. * * * References: [4]: http://www.htacg.org/ [5]: http://www.w3.org/community/htacg/ [6]: https://github.com/htacg/community/tree/master [7]: https://github.com/htacg/tidy-html5 [8]: http://www.w3.org/People/Raggett/tidy/ [9]: http://tidy.sourceforge.net [10]: https://github.com/htacg/tidy-html5/issues/137 [11]: http://www.html-tidy.org/ -- --- Jim Derry Clinton Township, MI, USA Nanjing, Jiangsu, China PRC
Received on Tuesday, 3 February 2015 07:49:14 UTC