Re: [DDI RDF Vocabulary] RE: CSVW and fixed record length, multiple record length from Wendy Thomas on 2016-06-02 (public-csvw@w3.org from June 2016)

From: Wendy Thomas <wlt@umn.edu>
Date: Thu, 2 Jun 2016 03:39:17 -0500
To: "Wackerow, Joachim" <Joachim.Wackerow@gesis.org>
Cc: "public-csvw@w3.org" <public-csvw@w3.org>, "ddi-rdf-vocabulary@googlegroups.com" <ddi-rdf-vocabulary@googlegroups.com>
Message-ID: <CAOrpSNq9mwtd5NUsrR9FL2K9D-ym6ttuQW262r=meoqZuEQ4OQ@mail.gmail.com>
Both files are of data aggregated to geographic units. The units, and thus
the identifiers are combinations of the geographic identification string
contents, vary by the type of geographic unit. Each case has a Summary
level which defines the geographic unit type (State, County, Place, etc.)
The documentation provides information on the geographic hierarchies which
dictate which geographic fields are needed to uniquely identify a case. For
example, Counties are unique within States so that it is the State Code
plus the County Code that is uses to locate a record in that summary level.
The 1990 data have record identifiers and part numbers which can be used to
identify unique records, then use the SUMLEV and appropriate geographic
codes to determine the area.

Wendy

On Thu, Jun 2, 2016 at 1:27 AM, Wackerow, Joachim <
Joachim.Wackerow@gesis.org> wrote:

> Hi Gregg,
>
>
>
> Sorry for the late response. I’m currently travelling.
>
>
>
> I understand it now this way that one approach could be: a separate
> specification should describe data with fixed record length and data with
> multiple records per unit, which can be mapped to the tabular data model.
> This would also support a related data transformation if desired.
>
> Is this correct?
>
>
>
> Regarding data with multiple records per unit:
>
> The data has usually fixed record length. Two variants seem to be common.
>
> 1.      Fixed number of records per unit and fixed logical record length.
> An identifier per unit in each record is not required.
>
> 2.      An identifier per unit in each record and possibly a record
> sequence number per unit. The number of records per unit may vary.
>
>
>
> Wendy Thomas from the Minnesota Population Center provided some examples
> which are publically available. Wendy is happy to answer any questions
> regarding these examples. She is subscribed to the list
> ddi-rdf-vocabulary@googlegroups.com (in CC).
>
> http://users.pop.umn.edu/~wlt/MultiRecordCases/
>
>
>
> The first example has two physical records per unit. The physical record
> length is 1800 and 1696 characters.
>
>
>
> The second example has a compound identifier per record.
>
> Wendy: What is here the unit identifier?
>
>
>
> Achim
>
>
>
>
>
> *From:* Gregg Kellogg [mailto:gregg@greggkellogg.net]
> *Sent:* Donnerstag, 26. Mai 2016 19:11
> *To:* Wackerow, Joachim
> *Cc:* public-csvw@w3.org; ddi-rdf-vocabulary@googlegroups.com
> *Subject:* Re: CSVW and fixed record length, multiple record length
>
>
>
> On May 26, 2016, at 1:25 AM, Wackerow, Joachim <Joachim.Wackerow@gesis.org>
> wrote:
>
>
>
> Hello,
>
>
>
> I’m wondering if possibilities were discussed (while the development of
> CSVW) to describe data with fixed record length and data with multiple
> records per case/unit.
>
>
>
> The use cases [1] have examples of fixed-lenght records, but I wasn’t
> personally involved in discussions about incorporating this; others in the
> group were likely involved in these discussions.
>
>
>
> However, note that the Tabular Data Model [2] allows for other formats,
> and only non-normatively describes parsing CSV to create an Annotated Data
> Model. See Embedding Tabular Metadata in HTML [3] which describes
> extracting tabular data from HTML tables, for example. Ultimately, it’s up
> to other standards to describe specific media types, which can be mapped to
> the tabular data model using a separate document, such as [3].
>
>
>
> The DDI Alliance developed a draft vocabulary (PHDD) on physical data
> description of tabular data. We compared now CSVW and PHDD. Our
> understanding is that CSVW is very powerful for all things described in the
> original scope of CSVW. It looks like CSVW could be interesting for users
> of the main DDI specifications. We are now hesitant to work further on the
> development of PHDD and to publish a final version.
>
>
>
> The only area where PHDD has additional features is the description of
> data with fixed record length and data with multiple records per case/unit.
> I understand that this is beyond the original scope of CSVW. Nevertheless
> I’m wondering if it would make sense to add these features to CSVW.
>
>
>
> Describing a process for converting fixed length record files into tabular
> data, would allow you to minimally describe how to work with the Tabular
> Data Model.
>
>
>
> I’m unclear on the use of multiple records per case/unit, and what the
> implications for mapping that over might be. Some examples for discussion
> might be useful.
>
>
>
> Both features, data with fixed record length and data with multiple
> records per case/unit, are used heavily in legacy data of older days where
> space limitations of storage played a major role. The DDI Alliance
> published a couple of specifications for data that result from
> observational methods in the social, behavioral, economic, and health
> sciences. DDI is used by social science data archives, research data
> producers in the social sciences, and national statistical institutes
> (NSIs).
>
> Archives and NSIs have still a large amount of data with fixed record
> length and data with multiple records per case/unit.
>
>
>
> I’m hoping this is the right forum to raise these questions. I copied the
> message to the discussion forum on DDI RDF vocabularies.
>
>
>
> Certainly, that’s the purpose of this forum and the Community Group.
>
>
>
> Gregg Kellogg
>
>
>
> [1] http://www.w3.org/TR/csvw-ucr/
>
> [2] http://www.w3.org/TR/tabular-data-model/
>
>
>
> Cheers,
>
> Achim
>
>
>
>
>
> References
>
>
>
> PHDD
>
> http://rdf-vocabulary.ddialliance.org/phdd.html
>
> http://ddi-alliance.org/Specification/RDF/PHDD
>
>
>
> DDI main specifications
>
> http://ddi-alliance.org/Specification/
>
>
>
> DDI Alliance
>
> http://ddi-alliance.org/
>
>
>
> List of main DDI Adoptors
>
> http://ddi-alliance.org/ddi-adopters
>
>
>
>
>
> --
>
> GESIS - Leibniz Institute for the Social Sciences
>
> Department: Monitoring Society and Social Change
>
> Team: Social Science Metadata Standards
>
> Visiting address: B2 1, 68159 Mannheim, Germany
>
> Postal address: P.O. Box 122155, 68072 Mannheim, Germany
>
> Phone: +49 (0)621 1246 262
>
> Fax: +49 (0)621 1246 100
>
> E-mail: joachim.wackerow@gesis.org
>
> www.gesis.org
>
>
>
> --
> DDI RDF Vocabularies: http://rdf-vocabulary.ddialliance.org/
> ---
> You received this message because you are subscribed to the Google Groups
> "DDI RDF Vocabulary" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ddi-rdf-vocabulary+unsubscribe@googlegroups.com.
> Visit this group at https://groups.google.com/group/ddi-rdf-vocabulary.
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Wendy L. Thomas                              Phone: +1 612.624.4389
Data Access Core Director                 Fax:   +1 612.626.8375
Minnesota Population Center             Email: wlt@umn.edu
University of Minnesota
50 Willey Hall
225 19th Avenue South
Minneapolis, MN 55455
Received on Thursday, 2 June 2016 12:27:23 UTC