W3C home > Mailing lists > Public > public-egov-ig@w3.org > September 2008

RE: Legislation on the web...

From: Appleby, Paul <paul.appleby@tso.co.uk>
Date: Tue, 9 Sep 2008 16:47:40 +0100
Message-ID: <D0163CD87CFBFE4C8A0498F6E9D81A090B4B51@w2exc017020.theso.co.uk>
To: "Peter Krantz" <peter.krantz@gmail.com>, "Sheridan, John" <John.Sheridan@nationalarchives.gov.uk>, <public-egov-ig@w3.org>
Cc: <jeni@jenitennison.com>
TSO, as the government's legislation publishing contractor (appointed by HMSO/OPSI), provide some of the tools and manpower for the production of legislation. I'll try and explain things as far as I can! As you might expect the workflows are somewhat complex and quite convoluted, especially because of the different parliaments/assemblies in the UK.
The XML schema that you have been looking at is one of the components that we have put in place in a move to simplify things and to try and move towards a standard for UK legislation. Originally, legislation was split along the lines of primary and secondary legislation (within TSO at least). The schema brings the two together as many aspects are common to both and this therefore provides portability between the two types of legislation (for instance to aid consolidation).
The schema now forms the master document format for TSO to store legislation. It is from this format that website XHTML, Atom feeds and soon-to-go-live sitemaps, are produced - using our wofklow tools. We currently produce legislation for the UK on www.opsi.gov.uk <http://www.opsi.gov.uk/>  and legislation specific to Scotland on www.opqs.gov.uk <http://www.opqs.gov.uk/> .
Additionally, we are currently working on a project to improve the website output for Explanatory Notes to primary legislation - again using an XML schema and then converting through to XHTML. Much of this schema is the same as the legislation schema as the schema is heavily modularised and allows easy re-use of components - thereby cutting down on development complexity. Similarly the Gazettes that John mentioned has an XML schema used for www.gazettes-online.co.uk <http://www.gazettes-online.co.uk/>  and this also uses some of these modules.
As far as authoring goes there are several approaches. The most advanced in terms of authoring is the secondary legislation workflow. TSO provide a Word template that legislators can use as an authoring tool. This allows authors to work in a WYSIWYG manner. The drafted document is exactly as will be printed. To control the process we have developed a validation portal where users upload their documents and get back a comprehensive PDF report to enable them to fix their document. This report is returned as the same document but with annotations so users can easily see which parts of their document need attention. If a document successfully passes validation we can be reasonably confident that it will convert through to XML without problems. (The validation process is actually template independent and is rules driven so we actually use the same approach for the submission of Word documents to the Gazettes - simply using a different rules set). The Word template also allows users to include content such as graphics, mathematics and user-created forms. Graphics get converted accordingly, mathematics becomes MathML and user-created forms become images.
The majority of secondary legislation now uses this workflow. For those documents that do not follow this route yet, old legacy typesetting-system based approaches are still used. but I think that we are somewhere around 98% of UK and Northern Ireland legislation going through the template. Scottish legislation does not yet fully use this workflow (although not for technical reasons). Welsh legislation has additional complexities due to the nature of dual language and the existing typographical layout (two-column balanced) - so the template is currently only used for authoring content and not for printing or XML conversion. However, with changes to the layout of Welsh legislation in the pipeline it should be possible to include Welsh too. For non-template documents components are in place to convert the content to the XML schema.
So, basically, for authoring of most secondary legislation authors needs only have a reasonably grasp of Word. TSO proved a support desk specifically for the template as well as workshops for users.
For primary legislation, different workflows are used throughout the country and things are more complicated. UK Acts are produced through a structured Framemaker-based system that uses different XML to the schema. Other areas uses Word-based workflows. We have workflows in place to convert the supplied formats through to our XML schema, at which point the same workflow as for secondary is used. TSO have done some prototyping of using the XML schema, an XML editor and XSL:FO to typeset primary legislation for the different regions with a good level of success. Obviously this would entail users learning the XML schema but initial discussions have actually found that there is actually little resistance to this.
Explanatory Notes to primary legislation have a Word template also but this is very old and primitive and in terms of content basically anything goes for authors, so conversion through to XML is fairly complex and will probably still involve some manual clean-up of certain Word features if they are used (i.e. Word Art). The goal of the Explanatory Note project is be provide an interweaved version of the document with the corresponding Act with relevant sections of the EN displayed at the correct location within the Act output.
In terms of numbers secondary legislation is, I believe, somewhere in the region of 3000-4000 documents per year for all regions, whilst primary is 100-200, so improvements to secondary legislation workflows provide good productivity gains.
Currently consolidation in the UK (http://www.statutelaw.gov.uk/) uses a different XML mark-up system, but the medium-term hope is to bring enacted and consolidated legislation together built around a common schema.
If you'd like any more details please just ask.

	-----Original Message----- 
	From: Peter Krantz [mailto:peter.krantz@gmail.com] 
	Sent: Tue 09/09/2008 09:34 
	To: Sheridan, John; public-egov-ig@w3.org 
	Cc: jeni@jenitennison.com; Appleby, Paul 
	Subject: Re: Legislation on the web...

	Thank you for your reply! Having looked through the schema and
	documentation for your XML format I am very interested in knowing more
	about your production process. How are the XML documents created? What
	tools do you provide for legislators that need to create documents
	without knowing the innards of your XML schema?
	Over here, we decided to use a generic document format to express the
	document structure. We are looking into XHTML (2) for this. To express
	specific legal information (e.g. document types, metadata and
	relations to other documents etc.) we have created a vocuabulary (with
	RDFS and OWL) that we use thorugh RDFa in the actual documents. This
	is makes it possible to do exciting stuff with the dataset while being
	simple to implement for publishers. This method also enables
	government agencies to create their own domain specific vocabularies
	to encode their own data.
	We also have to deal with a lot of PDF documents (e.g. old laws) and
	for these we require the same triples expressed in an additional
	metadata file. Documents are created at various government agencies,
	published on their website, collected (by reading an Atom feed) and
	assigned a permanent URI through a central system. Documents will
	eventually be available through the government legal information
	portal lagrummet.se (see http://www.lagrummet.se/english/).
	Kind regards,
	Peter Krantz
	Strategic Development Officer
	Verva, Swedish Administrative Development Agency
	Postal address: Box 214, SE-101 24 Stockholm, Sweden
	Telephone: 08-55 05 57 74
	Fax: 08-23 02 10

	On Sun, Sep 7, 2008 at 8:02 PM, Sheridan, John
	<John.Sheridan@nationalarchives.gov.uk> wrote:
	> Peter,
	> I'm responsible for the development of the UK Government's "official
	> legislation website" - and also for official Gazettes  on the web (the
	> London, Belfast and Edinburgh Gazettes).
	> See: http://www.opsi.gov.uk/acts/acts2008/ukpga_20080003_en_1 for an example
	> of how we publish legislation (it is semantic HTML and we use a bit of GRDDL
	> too). We've published our XML Schema here:
	> http://www.opsi.gov.uk/legislation/schema/

	> You might also be interested in "Sem Webbing the London Gazette"
	> http://2008.xtech.org/public/schedule/detail/528

	> We're really keen to share ideas!
	> I've cc'ed some of the others involved.
	> John Sheridan
	> Head of e-Services
	> Office of Public Sector Information
	> Admiralty Arch
	> North Side
	> The Mall
	> London
	> SW1A 2WH
	This e-mail has been scanned for all viruses by Star. The
	service is powered by MessageLabs. For more information on a proactive
	anti-virus service working around the clock, around the globe, visit:
	http://www.star.net.uk <http://www.star.net.uk/> 

This email, including any attachment, is confidential and may be legally privileged.  If you are not the intended recipient or if you have received this email in error, please inform the sender immediately by reply and delete all copies from your system. Do not retain, copy, disclose, distribute or otherwise use any of its contents.  

Whilst we have taken reasonable precautions to ensure that this email has been swept for computer viruses, we cannot guarantee that this email does not contain such material and we therefore advise you to carry out your own virus checks. We do not accept liability for any damage or losses sustained as a result of such material.

Please note that incoming and outgoing email communications passing through our IT systems may be monitored and/or intercepted by us solely to determine whether the content is business related and compliant with company standards.

The Stationery Office Limited is registered in England No. 3049649 at Clifton House, Worship Street, London, EC2A 2EJ.

Received on Tuesday, 9 September 2008 15:48:21 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:00:38 UTC