Fwd: Generating test data sets

Hi all, 

I'm forwarding a potential use case for SAWSDL that I received from Mr.
Tavis Reddick. It seems like a big thing involving technologies out of
our scope, and I'm not yet sure how to break it down into smaller, more
tractable parts, but maybe others will have more ideas. 8-)

The context is that using any available real data for testing may be
very unsound in terms of privacy, security or even legal conditions, so
one may want to generate fake data and semantic annotation of web
services might help with this task.

Best regards,
Jacek

-------- Forwarded Message --------
From: Tavis Reddick <tavisreddick@adamsmith.ac.uk>
To: jacek.kopecky@deri.org
Subject: Use Case for Semantic Annotations for WSDL
Date: Fri, 26 May 2006 00:01:09 +0100



Hi Jacek

I was present at your presentation "Semantic Annotations" at WWW2006
today, and since you asked for use cases, I've sketched this requirement
which I've been wondering about for a while.

Description
===========
A developer would like to generate a dataset in order to work on a
student record system. This dataset would need to include a lot of
personal details like addresses and telephone numbers, so he can't reuse
data of real people he might have access to.

He would like to choose the information model (from a set of relevant
schema?) and then get automatically generated data from relevant sources
to fill it, which can be imported into his test database.

>From a simple web interface to underlying web services, he might be able
to choose entities and relationships (or their semantic web equivalents)
using existing ontologies. For example, he chooses
Person/LivesAt/Address and the application searches the semantic web for
Person and Address datasets, perhaps returning census and geographic
results. There would then be the option to select statistical parameters
and randomization options, that the web service would then use to get
and combine the properties of Person and Address (random selections from
firstName, lastName, age, streetname, town/city and so forth) to form
fictional but realistic instances.

These would be combined in an XML dataset based on the initial model of
Person and Address, readily mapped to and imported into his test
database.

In this case, the developer might be especially interested in supporting
internationalization issues, so might choose web services supplying data
from multiple language sources. Or he might be keen that the data
doesn't look too real, so he could choose a fictional source of
geographical details (someone is bound to produce a Middle-Earth
location web service).


Anyway, apologies if this isn't what you're looking for or if it isn't
technically sound, but, although it could seem trivial, I think it has
many applications and could be a real time-saver. I imagine that a real,
workable example would need more web services connected in the pipeline
to allow non-technical people to generate such datasets.


Tavis Reddick
Web Developer
The Adam Smith College, Fife

The information contained within this e-mail is confidential and may be
privileged. It is intended for the addressee only. If you have received
this e-mail in error please inform the sender and delete this e-mail and
any attachments immediately. The contents of this e-mail must not be
disclosed or copied without the sender's consent.

 

The statements and opinions expressed in this message are those of the
author and do not necessarily reflect those of the author's employer
(the College). The College does not take any responsibility for the
views of the author.

Received on Tuesday, 6 June 2006 18:37:14 UTC