W3C home > Mailing lists > Public > public-dwbp-wg@w3.org > February 2015

Re: dwbp-ISSUE-134 (BernadetteLoscio): About Formats, schemas, vocabularies and data models [Best practices document(s)]

From: Bernadette Farias Lóscio <bfl@cin.ufpe.br>
Date: Tue, 3 Feb 2015 13:02:58 -0300
Message-ID: <CANx1PzyBsa+6GXPM=24yeCqEhk3a7yPcgr8kaLXjeR6e_BP7_Q@mail.gmail.com>
To: Data on the Web Best Practices Working Group <public-dwbp-wg@w3.org>
Hi all,

I'd like to discuss with you the difference between vocabulary, data
schema, data model and data format. João Paulo started this discussion
earlier in this message:

It is worth to read the whole message to better understand the
definitions. In the following, I show just parts of the message with
some definitions:
- About data representation and data format

"By "data representation" we mean any convention for the arrangement of
symbols in such a way as to enable information to be encoded by a data
producer and later decoded by data consumers.

A particular convention for data representation is often referred to as a
"data format"."


- About schemas

For example, an XML-based format can be
specified with a "schema document" in the XML Schema Definition language,
enabling XML documents to be checked for conformance to the format defined
in the schema document [XML-SCHEMA].

"schemas" are often used as a means to anchor natural language
descriptions to guide humans in the interpretation of data produced using
the format. Often, labels are used in these schemas to convey intuitive
meaning and guide interpretation, in which case these labels serve the role
of "terms" in communication. The collection of terms as used in the schema
is then referred to as a "vocabulary".


The notion of schema presented above is similar to the one of
relational schema in the database world. A relational database schema
describes the set of relation schemas of a given database. A relation
schema is composed by the name of the relation together with its
attributes. This specifies how to interpret instances of a given
relation (or table). In the database world, a data model consists of a
set of constructs to build databases. For example, in the relational
model, databases are represented as a collection of relations (or

IMO vocabularies may be used to describe data schemas even when the
RDF model is not being used. Vocabularies should be used to help tasks
like data integration and to improve data interoperability.

In this case, I suggest:

- the structure of the data should be referred to as the data schema
- the collection of terms used in the schema to describe how to
interpret data values should be refered to as the vocabulary
- the abstract syntax to define schemas should be referred to as data model

Example  (relational schema defined according to the relational data model):

Person(name, age, sex, id) --> this is the schema
terms name, age, sex and id --> this is the vocabulary


2015-01-22 13:46 GMT-03:00 Data on the Web Best Practices Working
Group Issue Tracker <sysbot+tracker@w3.org>:
> dwbp-ISSUE-134 (BernadetteLoscio): About Formats, schemas, vocabularies and data models  [Best practices document(s)]
> http://www.w3.org/2013/dwbp/track/issues/134
> Raised by: Joao Paulo Almeida
> On product: Best practices document(s)
> The group needs to settle on some concepts (and ultimately terms) that should help us to structure our discussions,  give us a basis to communicate and help our audience to understand us.

Bernadette Farias Lóscio
Centro de Informática
Universidade Federal de Pernambuco - UFPE, Brazil
Received on Tuesday, 3 February 2015 16:03:46 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:39:31 UTC