Re: CSVW and fixed record length, multiple record length

> On May 26, 2016, at 1:25 AM, Wackerow, Joachim <Joachim.Wackerow@gesis.org> wrote:
> 
> Hello,
>  
> I’m wondering if possibilities were discussed (while the development of CSVW) to describe data with fixed record length and data with multiple records per case/unit.

The use cases [1] have examples of fixed-lenght records, but I wasn’t personally involved in discussions about incorporating this; others in the group were likely involved in these discussions.

However, note that the Tabular Data Model [2] allows for other formats, and only non-normatively describes parsing CSV to create an Annotated Data Model. See Embedding Tabular Metadata in HTML [3] which describes extracting tabular data from HTML tables, for example. Ultimately, it’s up to other standards to describe specific media types, which can be mapped to the tabular data model using a separate document, such as [3].

> The DDI Alliance developed a draft vocabulary (PHDD) on physical data description of tabular data. We compared now CSVW and PHDD. Our understanding is that CSVW is very powerful for all things described in the original scope of CSVW. It looks like CSVW could be interesting for users of the main DDI specifications. We are now hesitant to work further on the development of PHDD and to publish a final version.
>  
> The only area where PHDD has additional features is the description of data with fixed record length and data with multiple records per case/unit. I understand that this is beyond the original scope of CSVW. Nevertheless I’m wondering if it would make sense to add these features to CSVW.

Describing a process for converting fixed length record files into tabular data, would allow you to minimally describe how to work with the Tabular Data Model.

I’m unclear on the use of multiple records per case/unit, and what the implications for mapping that over might be. Some examples for discussion might be useful.

> Both features, data with fixed record length and data with multiple records per case/unit, are used heavily in legacy data of older days where space limitations of storage played a major role. The DDI Alliance published a couple of specifications for data that result from observational methods in the social, behavioral, economic, and health sciences. DDI is used by social science data archives, research data producers in the social sciences, and national statistical institutes (NSIs).
> Archives and NSIs have still a large amount of data with fixed record length and data with multiple records per case/unit.
>  
> I’m hoping this is the right forum to raise these questions. I copied the message to the discussion forum on DDI RDF vocabularies.

Certainly, that’s the purpose of this forum and the Community Group.

Gregg Kellogg

[1] http://www.w3.org/TR/csvw-ucr/ <http://www.w3.org/TR/csvw-ucr/>
[2] http://www.w3.org/TR/tabular-data-model/

> Cheers,
> Achim
>  
>  
> References
>  
> PHDD
> http://rdf-vocabulary.ddialliance.org/phdd.html <http://rdf-vocabulary.ddialliance.org/phdd.html>
> http://ddi-alliance.org/Specification/RDF/PHDD <http://ddi-alliance.org/Specification/RDF/PHDD>
>  
> DDI main specifications
> http://ddi-alliance.org/Specification/ <http://ddi-alliance.org/Specification/>
>  
> DDI Alliance
> http://ddi-alliance.org/ <http://ddi-alliance.org/>
>  
> List of main DDI Adoptors
> http://ddi-alliance.org/ddi-adopters <http://ddi-alliance.org/ddi-adopters>
>  
>  
> --
> GESIS - Leibniz Institute for the Social Sciences
> Department: Monitoring Society and Social Change
> Team: Social Science Metadata Standards
> Visiting address: B2 1, 68159 Mannheim, Germany
> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
> Phone: +49 (0)621 1246 262
> Fax: +49 (0)621 1246 100
> E-mail: joachim.wackerow@gesis.org <mailto:joachim.wackerow@gesis.org>
> www.gesis.org <http://www.gesis.org/>

Received on Thursday, 26 May 2016 17:11:52 UTC