W3C home > Mailing lists > Public > public-csv-wg@w3.org > January 2014

CSV on the Web Working Group | XBRL

From: <eric.e.cohen@us.pwc.com>
Date: Wed, 29 Jan 2014 09:53:21 -0500
To: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <OFCC591FBB.FD9A0038-ON85257C6F.004D01B2-85257C6F.0051C9CA@pwc.com>
Please note, the opinions expressed here are my own and do not represent 
my employer or XBRL International. 

I am very pleased to see a new W3C effort on the area of CSV and the Web. 

I see a goal of "provid[ing] technologies whereby data dependent 
applications on the Web can provide higher interoperability when working 
with datasets using the CSV (Comma-Separated Values) or similar formats. 
As well as single CSV files, the group will define mechanisms for 
interpreting a set of CSVs as relational data. This will include the 
definition of a vocabulary for describing tables expressed as CSV and 
locatable on the web, and the relationships between them."

As a co-founder of XBRL (Extensible Business Reporting Language) and the 
primary architect of a specification called XBRL Global Ledger Taxonomy 
Framework (XBRL GL), I have done a lot of thinking about that very issue 
as it relates to the domain of the Business Reporting Supply Chain, and 
CSV/text-to-XML/XBRL. Many ERP systems have native export to CSV 
capabilities; a CSV to Web interface will integrate these into the Web 
environment with a minimum of effort.

We have been developing internal drafts on a descriptive language for 
describing the CSV to XML/XBRL connection and facilitating the conversion, 
and have vendors who have prototyped applications that demonstrate the 

It may provide some helpful background for the group, and if I can help 
hone it into a business case suitable for your use, I would be pleased to 
work with you. If what I present here is completely irrelevant, I will not 
be hurt if you ignore it completely. However, I would covet your 
consideration - for our purposes, if not yours - and glad to help with 
yours if I can.

XBRL GL: detailed business master files and transaction data expressed 
with XML - some people prefer CSV

As I noted,  my particular focus in the XBRL community is something called 
XBRL GL (Global Ledger Taxonomy Framework) (0), the use of XBRL to 
represent detailed business transactional data (the setup, master, 
transactional and historical data files of ERP systems). XBRL GL is being 
used for purposes such as consolidating data across disparate corporate 
ERP systems (1) or bringing together government agency data in Brazil (2) 
or being used as the electronic bookkeeping archiving format for tax 
purposes in Turkey (3). It is highly hierarchical; it is representing 
content normally found in relational databases by instead using XML/XML 

Detailed accounting data files can be very large in their native database 
format. Turn that into XML, and things, unzipped, may balloon As those 
involved in XML are likely aware, XML was not designed to be terse; it is, 
in fact, "verbose by design". (4) (Of course, it zips well.) That leads to 
large ERP extracts when expressed in XML/XBRL GL. Such extracts are used 
in system integration, consolidation, data migration and archival 

A highly visible community of users of ERP extracts is auditors - in 
particular internal auditors, financial auditors (CPAs, CAs), and tax 
auditors. For that reason, some of the auditor community that needed to 
transport extracts of ERP data from corporates wanted the benefits of 
standardized ERP metadata such as XBRL GL (e.g., robust data 
representations, easier interpretability of data, validation, multiple 
language labels for data fields, applicability of standardized business 
rules with RIF, ISO Schematron or XBRL Formula) with the terseness of 
delimited or fixed length text for transport and 
memory-consumption/processing issues.

How to get the best of CSV and the best of XBRL GL?

To that end, we (the XBRL GL Working Group) began considering conventions 
for describing delimited and fixed-length text files using the semantic of 
XBRL GL. Does the second grouping of characters in the comma delimited 
file represent the main account number (gl-cor:accountMainID)? Does the 
text starting 31 characters in and going 40 characters represent the 
inventory description (gl-bus:measurableDescription)? (As we know, the "C" 
of CSV doesn't account for regional separators, such as the semi-colon or 
pipe {"|"), nor fixed length alternatives; we wanted to take all that into 

The first thought was just to use fully qualified XBRL GL concepts as a 
header row in the text file. It had to be better than nothing - instead of 
variations of "account#", or "accountNo", or "Account Number", or 
"Identificador de la Cuenta", or "勘定科目番号" for a column representing 
the account number, just use the standardized gl-cor:accountMainID. That 
was certainly better than nothing, at least for human interpretation!

The next step was thinking about a reusable configuration file that would 
describe a text output format(s) and be used without any modification to 
the source data. The embedded header approach, for example, doesn't let 
you span CSV files easily. It doesn't permit mapping between source 
content and enumerations, or facilitate rovisional/calculated fields.

XBRL GL Data Definition File: Mappping from CSV to XBRL GL

 With a standardized configuration file (we used XML Schema to define and 
XML to instantiate) that lets you identify each field in the text by a 
(standardized) XBRL GL description, you could keep your text file for 
transport and handling while beginning to gain the benefits of the XBRL 
GL, and transform the content into at least minimal valid XBRL GL (if 
desired) for validation and consumption. Even without transforming the 
source data into XBRL GL, a  savvy application could look at XBRL GL's 
definitions, find that a certain field should represent a date, or amount, 
or a valid ISO 4217 code and check the text content directly. We 
unimaginatively call that standardized XML file an XBRL GL Data Definition 
File, or XBRL GL DDF, and created an XML Schema file to define it 
(provided below).

As with the goal of this group, we had to consider not just a single 
table/CSV but multiple tables, representing as an example headers and line 
items for invoices or orders.

The need for a standardized configuration file for this purpose was 
renewed urgently a few months back; the American Institute of Certified 
Public Accountants (AICPA) published a new specification for describing 
accounting information to be shared between an audit client and their 
auditors. The series of specifications, called the Audit Data Standards 
(5), defines specific groupings of data important for doing a traditional 
financial audit, and will grow to meet the needs of a broader audit 
community (internal audit, tax audit, etc.). The ADS specifies the desired 
content and rules for its formatting, and defines two syntaxes for its 
representation: pipe-delimited text and XBRL GL.

Developing a standards-based approach to be able to losslessly transform 
files between the two formats seemed important. For those environments 
where the PipeDF was the primary option, such as older report writer 
systems that can produce text but not XML, opening up the world of XBRL GL 
(for standards-based validation of content, applicability of standard 
business rules, greater reusability and scalability, etc.) seemed 
important. To that end, I redoubled my effort to update the earliest 
purely conceptual DDF design to something we could begin to test in a 
working environment. 

Similarly and more historically, the OECD published syntax-independent 
guidance documents for a series of Standard Audit File (Taxation, 
Payroll); while recommending XML and especially XBRL GL, it recognized 
that tax administrations may wish to use any format; being able to have a 
standardized definition of the relationship between a text version in one 
country to an XML version in another could break down some barriers.

What we have works on simple test cases. We know more is necessary. But as 
you are beginning to explore the same area, I thought it was important to 
share our thoughts.

Broader applicability?

I have attached some files:

i. A Word document (that is a woefully in need of update) backgrounder on 
the attached XSD. If this topic is of interest to the group, I will 
accelerate its being updated. For example, the Schema was updated as we 
recognize the need for additional constraints and calculated fields, and 
so the Schema shows an additional structure not documented in the Word 
file that began to lay this area out.

ii. The aforementioned XSD, which provides the structure of the XBRL GL 
Data Definition file. The file is not an official public working draft, 
and has no official status other than internal draft. However, I am 
sharing it here for educational purposes. (Although there is no reason the 
XBRL GL DDF has to be limited to transforming from text to XBRL GL; it has 
applicability to transforming from text to other XML; I just can't promise 
it will work with other schema-based designs, and we have barely 
stress-tested it with XBRL GL itself).

iii. A Powerpoint presentation describing the effort.

iv. An example, not perfected, for transforming between the Pipe-delimited 
and XBRL GL formats of one of the tables in the AR Audit Data Standard 
from (5) below,

So the primary goal is to be able to use the semantic of XBRL GL without 
being bound to its syntax in transport and handling. (6) The thinking 
behind another XBRL publication, something called Inline XBRL, also brings 
additional tools for regionalization and varieties of dates and numeric 
formatting that will be found in the CSV files and need to be transformed 
into what XML required that can be leveraged in these transformations. (7)

We can then leverage the fact that virtually every accounting software 
product can create a standard text-export; with a series of DDFfiles, 
those exports can automatically be transformed into XBRL GL, and 
automatically be turned into XBRL GL profiled to represent the AICPA Audit 
Data Standard (or any other profile of data, such as a tax audit data 
standard file). 

Our work is purely prototype and our cases (from single table Audit Data 
Standard in pipe-delimited format to XBRL GL) simple. We look forward to 
learning from your group's effort, contributing as appropriate,  and being 
able to incorporate it into the XBRL GL environment as possible.

<eccn />

(-1) http://www.w3.org/2013/csvw/wiki/Main_Page
(0) http://www.xbrl.org/GLTaxonomy

(2) http://raw.rutgers.edu/28wcars and especially 
(3) http://www.edefter.gov.tr/web/guest/2
(4) http://www.w3.org/XML/1999/XML-in-10-points.html.en
(6) http://www.omg.org/news/meetings/tc/agendas/va/FDTF_pdf/Cohen_XBRL.pdf

Eric E Cohen
PwC | XBRL Global Technical Leader
Office: 1-585-271-4070 | Mobile: 1-585-317-4799
Email: eric.e.cohen@us.pwc.com
PricewaterhouseCoopers LLP
Rochester, NY USA

Thoughts don't need paper to take shape.

The information transmitted, including any attachments, is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you received this in error, please contact the sender and delete the material from any computer. PricewaterhouseCoopers LLP is a Delaware limited liability partnership.  This communication may come from PricewaterhouseCoopers LLP or one of its subsidiaries.

Received on Wednesday, 29 January 2014 16:42:49 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:27:35 UTC