- From: Nick Kew <nick@webthing.com>
- Date: Tue, 9 Sep 2003 21:34:42 +0100 (BST)
- To: public-qa-dev@w3.org
Rationale ========= Many applications today benefit from an SGML and/or XML Entity Catalogue to dereference entities referenced by a Public Identifier. For a validating SGML parser this is an absolute requirement. For any SGML or XML parser it serves to enable entities such as DTDs and modules to be resolved locally. Hitherto, different packages and applications have distributed entity catalogues. Examples are Docbook, HTML Validators, the OpenSP parser, and operating system distros. However, there is little coordination between the distributors of these, and no common package distributors can rely on. Even in tightly-controlled environments such as the Debian packages, the W3C Validator includes its own Entity Catalogue rather than relying on it being available as a dependency. This situation should be rationalised to allow for an SGML and XML catalogue to be a single package on which other packages can depend. In this note, we propose a framework for managing such a package. Goals ===== * To maintain a Universal Catalogue * To provide an automated process for generating local installations of all or part of the Universal Catalogue. * To minimise the effort and coordination required to ensure that the universal catalogues and local installations remain up-to-date. In particular, end-users should be offered a self-maintaining default installation that eliminates effort on their part altogether. * To enable control of different parts of the catalogue to be delegated to the people/organisations responsible for them. A loose analogy could be drawn to DNS. But since immediate lookup of [SG|X]ML entities is dealt with by SYSTEM ids, we only have to deal with efficient cacheing of local copies of PUBLIC ids. Entities are in general long-lived, but by no means immutable (for example, the MathML 2 DTD modules have undergone several minor revisions). Managing a Universal Catalogue ============================== In principal, all organisations creating public identifiers should be registered with ISO. But this is not widely practiced, and the present chaotic situation indicates that it is not effectively meeting todays needs. We propose that a distributed architecture for automating catalogue management is both feasible and preferable. #### ISO registry: availability??? Our proposal envisages a central registry, cooperating with a set of recognised repositories each managing its own entity catalogue locally. For example, the W3C, WapForum and Oasis each manage their own catalogues independently. Likewise, different groups acting independently within W3C are responsible for different areas such as HTML, MathML, SVG and SMIL. We propose that a universal catalogue will work best if responsibility for each sub-catalogue is explicitly devolved to the working group responsible for defining it. The central registry will serve merely to reference the reponsible groups, in a manner somewhat analagous to DNS. This is broadly in line with the registry already run by the ISO but not widely used. What our proposal adds is the availability of the registry online in machine-readable format, and its integration with catalogues maintained by each participating organisation. It is possible that tying the registry in to distribution of Markup libraries and catalogues may in itself be an incentive for organisations to register. #### Implications for naming conventions? Implementation ============== Since the Universal Catalogue serves SGML and XML applications, it is appropriate that it should itself be capable of implementation as an SGML or XML application. This is straightforward: all we need is a DTD for declaring catalogues and catalogue entries, and a list of entities defining catalogues maintained by the groups entrusted with doing so. This is then implemented by a program to fetch the data required and write the catalogues. Local installations may be customised by selecting which entities to include, while package maintainers can ship a standard configuration. An implementation demonstrating the above is available at <URL:http://valet.webthing.com/catalogue/>. It fetches the master catalogue, DTD and Entities by HTTP. It updates all entries defined, but uses HTTP If-Modified-Since header to avoid the overhead of re- fetching anything that is already up-to-date in the local installation. It can therefore be run regularly (e.g. monthly) with minimal overhead. CatalogueManager may be used as-is, but is intended as a proof-of-concept. Non-technical issues such as how to delegate responsibility for different sub-catalogues need to be addressed, and the file format used for the demonstrator is likely to be subject to improvement. Security ======== A package such as CatalogueManager that updates system files based on third-party definitions has potential to introduce malicious files. It is strongly recommended that standard system security be used to avoid serious consequences in the event of any of the sub-catalogues being compromised. CatalogueManager should run as a user with no privilege to write to the local filesystem except within a designated SGML/XML library area, such as /usr/local/share/sgmlib. Distributors creating a package such as an RPM of CatalogueManager should ensure your users' security. A more inherently secure architecture would generate all local filenames internally, and is probably preferable. The current implementation serves for back-compatibility until the proposal can be considered stable.
Received on Tuesday, 9 September 2003 16:34:46 UTC