RE: Datatypes with no (cool) URI

I am a researcher working on some Demographic Social Simulation Models. In the simple models, I distinguish people classed male at birth and people classed female at birth and gender ambiguity, reassignment (sex change) and gender recalssification are not modelled. In more complicated models these things might be modelled and if I were modelling that, I would consider storing a list of changes and have more classes or somehow quantify maleness and femaleness. The point I am making here is that the assignment of gender (or sex depending on what word you prefer) could be time dependent.

In an attempt to make my data storage and retrieval work better I implemented two main data stores for people: those classed female at birth; those classed male at birth. In my models, even if current gender were re-assigned data for that individual would still be stored in the same data store.

I suspect that in ambiguous cases in reality what is done in terms of gender classification might be different for different countries.

BTW: gender ambiguity was topical in the mainstream media in the Autumn in the UK [1]. It is not as uncommon as you might think...

So, gender is a fuzzy thing. Maybe we all belong to male and female classes to a degree and for most of us this distinction is binary. In terms of encoding, in my implementations I've used 0 for female and 1 for male as I find that easy to remember and computationally it makes sense.

Andy

[1] http://www.bbc.co.uk/news/health-14459843

________________________________________
From: Phil Archer [phila@w3.org]
Sent: 03 April 2012 14:33
To: public-lod@w3.org
Subject: Datatypes with no (cool) URI

I'm hoping for a bit of advice and rather than talk in the usual generic
terms I'll use the actual example I'm working on.

I want to define the best way to record a person's sex (this is related
to the W3C GLD WG's forthcoming spec on describing a Person [1]). To
encourage interoperability, we want people to use a controlled
vocabulary and there are several that cover this topic.

ISO 5218 has:
0 = not known;
1 = male;
2 = female;
9 = not applicable.

and Eurostat offers
F = female
M = male
OTH = other
UNK = unknown
NAP = not applicable

IMO, the spec should not dictate which one to use (there are others too
of course). What I *do* want to do though is to encourage publishers to
state which vocabulary they're using. Sounds like a job for a datatype -
and for that you need a URI for the vocabulary. Something like:

schema:gender "1"^^<http://iso.org/5218/> .

Except I made that iso.org URI up. The actual URI for it is
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=36266
(or rather, that's the page about the spec but that's a side issue for
now).

That URI is just horrible and certainly not a 'cool URI'. The Eurostat
one is no better.

Does the datatype URI have to resolve to anything (in theory no, but in
practice? Would a URN be appropriate?

Given that the identifier for the ISO standard is "ISO/IEC 5218:2004"
how about urn:iso/iec:5218:2005?

For Eurostat, the internal identifier for the vocabulary is "SCL - Sex"
(standard code list) so would urn:eurostat:scl:sex be appropriate?

Anyone done anything like this in the real world?

All advice gratefully received.

Thank you

Phil.


[1] https://dvcs.w3.org/hg/gld/raw-file/default/people/index.html

--


Phil Archer
W3C eGovernment
http://www.w3.org/egov/

http://philarcher.org
@philarcher1

Received on Tuesday, 3 April 2012 15:09:23 UTC