CSV-LD Use Cases/Requirements analysis (ACTION-4)

I took on an action to consider the coverage of the use cases and requirements [1] with the CSV-LD proposal [2]. This resulted in some updates to the proposal, based on an examination of the requirements and previous discussions.

R-WellFormedCsvCheck: N/A

R-MultipleHeadingRows: CSV-LD expects the first row after one or more rows containing only empty fields, to be field headings mapped directly to terms within the CSV-LD mapping frame. File headings can be handled if they are followed by one or more empty rows. There is no provision for subheadings at present. Some provision could potentially be made in the CSV-LD mapping frame to describe multiple heading rows, in which case the term could be the newline separated values of consecutive column headings.

R-HeadingColumns: CSV-LD expects the first row after one or more rows containing only empty fields to be the heading row, with individual fields of that heading row to be used as terms within the CSV-LD mapping frame.

R-TableNormalization: the @type of a term could be used to perform basic normalization of field values in the associated column. If the term type is @id or @vocab, field values are either matched to other terms defined within the CSV-LD @context (when they are appropriate for use with the header term) or coerced to be compatible with an IRI. If the term type maps to a set of XSD datatypes, this could be used to parse the value to match the specified datatype, perhaps by using a relevant XSLT parsing mechanism.

R-CellValueMicroSyntax: Other than cell normalization as discussed above, a term having a @container of @set or @list can be used to infer that a matching field value is intended to encoding multiple values, presumed being comma-separated. An additional extension might be used to define a separator specification.

R-NonStandardFieldDelimiter: Currently assumed to work with CSV or TSV. An extension announcement of alternative parsing, such as for different delimiter or fixed-width fields, could be added to the CSV-LD @context.

R-PrimaryKey: this is the requirement to be able to determine the primary key for an entity described within a CSV file. In CSV-LD parlance, a primary key equates to the @id member of a node. In Use Case #4 example 7, this would be the "Post Unique Reference" column values. The partial mapping for this example might be the following:

{
  "@context": {
    "@extension": "http://www.w3.org/ns/csv-ld",
    "Post Unique Reference": "@id",
    "Name": "schema:name",
    "Grade": "_:grade",
    ...
  },
  "Post Unique Reference": null,
  "Name": null,
  "Grade": null,
  ...
}

R-ForeignKeyReferences

Foreign Keys are handled natively in JSON-LD; a field representing a foreign key reference is used as the value of a member with a term having a type of @id.

R-ExternalDataDefinitionResource: The CSV-LD mapping frame allows additional metadata to be added to each record, such as a predicate relating one node to another within the same record, or between records. Example 20 indicates that different fields would be treated differently if they were in string or symbol representation; this is not currently anticipated by CSV-LD.

R-AnnotationAndSupplementaryInfo: No particular provision for this within CSV-LD. Certain cases may fall out of general capabilities.

R-AssociationOfCodeValuesWithExternalDefinitions: CSV-LD allows values to be matched with terms defined within the CSV-LD @context, if the values are of type @vocab. Otherwise, if of type @vocab or @id, they are coerced to be compatible with IRI.

R-CsvAsSubsetOfLargerDataset: A Given mapping may create a dataset (JSON-LD or RDF) which, due to the principles of Linked Data, can be used in concert with other datasets.

R-LinksToExternallyManagedDefinitions: CSV-LD allows values to be matched with terms defined within the CSV-LD @context, if the values are of type @vocab. Otherwise, if of type @vocab or @id, they are coerced to be compatible with IRI.

R-SyntacticTypeDefinition: See R-TableNormalization.

R-SemanticTypeDefinition: Unsure.

R-MissingValueDefinition: No specific support for non-empty value types (e.g. -999). Provisions for micro-syntaxes or normalization may allow mapping based on the @type definition. @vocab types may match term definitions, which can match any string value. @id types are coerced to IRI.

R-URIMapping: Supported by specifying @base or @vocab within CSV-LD mapping frame.

R-UnitMeasureDefinition: Units can be inferred by enclosing the value in an appropriate JSON-LD syntactic structure to represent the desired RDF.

R-GroupingOfMultipleTables: A single CSV-LD mapping frame may represent information from multiple datasets, separated in a file by a row with empty values, or potentially as part of a multipart mime, separate worksheet or multiple files. terms used within a CSV-LD mapping frame that do not map to a column name in the current dataset are left as null, which causes them to be removed on expansion.

R-CsvValidation: out of scope

R-CsvToRdfTransformation: Native capability of the JSON-LD generated by applying the CSV-LD mapping frames to the datasets and using the JSON-LD ToRdf algorithm.

R-CsvToJsonTransformation: Native capability of the JSON-LD generated by applying the CSV-LD mapping frames to the datasets.

R-CanonicalMappingInLieuOfAnnotation: Can be done using IRI templates within the CSV-LD @context.

R-RandomAccess: No provision in CSV-LD.

R-ZeroEditCompatibility: N/A

R-ZeroEditAdditionOfSupplementaryMetadata: Can be specified as default data in CSV-LD mapping frame.

Gregg Kellogg
gregg@greggkellogg.net

[1] http://w3c.github.io/csvw/use-cases-and-requirements/
[2] https://www.w3.org/2013/csvw/wiki/CSV-LD

Received on Monday, 17 March 2014 00:13:44 UTC