Mark H. Butler
Digital Media Systems Project, HP Labs Bristol
2 August 2005
The W3C has recently set up a new working group looking at the problem of Device Description for the Mobile Web. The author of this document worked on the problem of device description for mobile devices for a number of years. During that time he produced a number of technical reports [BUTLER] and the open source DELI API and validator for UAProf and CC/PP [DELI], which is now widely used for profile validation.
This document presents the author’s view about what problems the CC/PP [CC/PP] and UAProf standards [UAPROF 1.1], [UAPROF 2] currently face and how these standards should advance. It outlines a number of actions that can be taken in order to ensure these standards can solve the problem of device description more effectively.
Here is a summary of the major findings of this document, which will be discussed in more depth in subsequent sections.
Currently there are several approaches to adapting content for mobile devices: proprietary databases based on user agent strings, WURFL [WURFL] which is an open source XML database based on user agent strings, UAProf, CC/PP, media queries, [MEDIA-QUERIES] proposed as part of the W3C CSS work, and DPF [DPF], proposed as part of the W3C DI-WG / MMI work. Only proprietary databases, WURFL and UAProf are widely used in production systems.
We have a number of comments about these approaches although mainly about CC/PP and UAProf, and several of these are refinements of issues we raised about CC/PP at last call:
1. Many devices now provide UAProf information, and the quality of that information has improved considerably since UAProf was first released. Therefore rather than replacing UAProf with a new standard, we think it would be better to enhance and simplify this existing standard.
2. CC/PP creates unnecessarily limitations on what can be encoded that are not present in RDF, so we believe CC/PP offers little benefits over RDF and in the future profiles should be written in RDF directly [ISSUE-155], [ISSUE-159].
3. Experience with UAProf has clearly demonstrated the need for validating profiles. We believe this is a general problem for semantic web applications, so there needs to be a standard, generic way of validating RDF [ISSUE-161].
4. There needs to be tools for validating device description vocabularies, i.e. schemas and ontologies, as well as profiles. Some tools exist for OWL [OWL-VALIDATOR] but no tools exist for RDF Schema.
5. When new device description vocabularies are created, the schemas describing those vocabularies must be published on the Web in a machine readable format from the same URL used in the schema namespace so they can be automatically discovered [ISSUE-162].
6. Where possible, device description vocabularies should define property values as well as properties [ISSUE-170].
7. There needs to be a quality assurance process to ensure errors identified by third parties in both profiles and schemas are corrected.
8. UAProf contains resolution rules that determine how a number of profiles should be combined. The resolution rules do not work in all situations so rather than fix this by adding additional resolution rules, resulting in increased complexity, we think this process would be better left up to the server.
10. Most UAProf device profiles are created using text editors and this process is more complicated than it needs to be due to the complexity of RDF/XML. There are a number of steps that could be taken to simplify this process, such as creating editors that can use device description vocabulary schemas or by having a best practice document that gives some tips on other approaches such as entity declarations, XSLT or N3.
11. Some UAProf profiles contain copyright statements so the rights of end users are not clear. This should be clarified.
12. Future device description architectures should allow device capability data to come from a variety of sources, not just device manufacturers, in a dynamic way, without requiring centralized management. Many other semantic web applications require similar architectures, so there should be a standard, generic architecture for these type of problems.
13. The quality of data in WURFL is better than UAProf mainly because it created by reviewing the UAProf data [UAPROF-FUN]. The main technical advantages of WURFL over UAProf are it uses inheritance to simplify the description of mobile devices. Second it is slightly easier to author as it is written in XML rather than RDF/XML, although we think the complexity of authoring RDF/XML can be addressed in other ways as discussed in point 10. Third it uses boolean property values wherever possible, which partly overcomes some of the problems due to inconsistencies in property value encoding in UAProf. Others have noted that the WURFL data contains errors similar to UAProf [WURFL-USAGE] because like UAProf, it does not have adequately solve the problem of formalization and validation.
CC/PP [CC/PP] is a W3C recommendation of a structure for representing device profile information in RDF/XML. The author was involved in the later stages of development of this specification.
A CC/PP profile consists of a set of components, each of which contains a number of properties, which are either single or multi-valued. It is possible to have default components in the profile, which are overridden by non-default components.
This structure is much more restrictive than a general RDF model, and there have already been a number of occasions where people have struggled to describe some aspect of a device in CC/PP [MMI-INK]. For example, it is hard to represent a number of alternative sets of property values: on some UAProf devices manufacturers have resorted to using composite properties to do this [GD-67] e.g.
In the example above the property indicates this particular device will accept media objects of mime type application/x-mmc.wallpaper as long as they are no bigger the 25600 bytes, have a color depth of 8 bits and are 120 pixels high by 136 pixels wide, and that a GIF encoding to be used for these wallpaper media objects. In the XML community, combining a number of values into a single composite value in this way is generally thought to be a bad idea because it requires additional microparsing of property values [ISSUE-203].
In XML, when you want to use XML in an application you create a serialization which has a particular syntax or structure. In RDF you do not need to define a structure: you just have a vocabulary / ontology. CC/PP defines a small vocabulary, but principally it defines a structure which seems largely unnecessary and only serves to reduce the expressive power of RDF. Therefore in our view it would be better to use RDF directly. All that is needed to use RDF to describe devices is one or more vocabularies, for example a UAProf vocabulary or the DIWG CPC work, and a protocol similar to the UAProf protocol. Because RDF is a superset of CC/PP and UAProf are RDF, RDF processors would still be able to consume existing CC/PP and UAProf profiles.
UAProf is an OMA standard for representing device profile information in RDF/XML. Unlike CC/PP, it provides three things:
UAProf is much more widely used than CC/PP, partly because unlike CC/PP it specifies a vocabulary and a protocol which are essential to solving the problem. UAProf suffers from the same structural limitations as CC/PP described in the previous section.
In the early days of UAProf, the quality of profiles was poor, because generally profiles were written by hand and were not validated. Now a number of validators exist, such as the DELI validator [DELI], [VALIDATION] produced by HP or the Openwave validator [OPENWAVE-VALID], so generally the quality of UAProf profiles has improved. Unfortunately as DELI was originally only intended as a proof of concept prototype it is a command line tool so it requires some familiarity with Java, which this presents an obstacle for some UAProf developers. This problem was recently solved by a web-based validator based on DELI [OMA-VALID] that is hosted by the OMA. The development of this validator was made possible by the fact that DELI is open source.
We believe open source is an important way for companies to collaborate to create a mobile web, and we encourage companies to become more actively involved in projects such as WURFL or DELI. We understand that a number of mobile phone vendors have taken DELI and enhanced it internally. Ideally such enhancements should be contributed back as this would benefit the entire community, as well as reducing the maintenance costs for the companies involved.
Currently there is no RDF Schema validator, so consequently some device description vocabularies, such as some of the OMA UAProf schemas, contain errors. Typical errors include attributes that are not namespace qualified, the incorrect association of Bag, Seq and Property with the RDF Schema namespace, confusion about when to use rdfs:range and when to use rdf:type, misspelling of component names and the use of incorrect namespaces for RDF and RDF Schema [UAPROF-2000]. Therefore just as web based tools are needed to validate profiles, there is also a need for web based tools to validate vocabulary descriptions.
When errors are identified in schemas, they need to be corrected. Because schemas are often defined in formal specifications, we have encountered unwillingness to resolve problems in published schemas. We would like to emphasise that for advanced validation and processing tools to work, they need to be able to obtain correct schema information. Currently the errors in the public schemas means that DELI has to ship with its own corrected versions schemas. It would be much better if the public versions of the schemas were corrected, as then it is possible for validators such as DELI to consume the official OMA schemas to perform validation. In addition if public versions of schemas contain errors, then the chances of future schemas also containing errors are increased, because existing schemas are often used as a starting point for creating new ones.
We would like to point out that the idea of validating profiles is actually rather heretical for some people working on the Semantic Web, because it involves breaking the “open world assumption” [TWO-TOWERS]. When it was first suggested it back in the days of the W3C CC/PP working group, it was ruled out for this very reason.
The device description community may not be familiar with the open world assumption so here is a brief explanation about how it applies to validation. One common mistake in UAProf is to use a property called “PixelsAspectRatio” i.e. have an extra “s” on “Pixel”. In fact the correct spelling of this property is “PixelAspectRatio” with no extra “s”. However if we strictly follow the open world assumption, then we say that “The OMA in their schema has defined a property called PixelAspectRatio associated with a particular UAProf namespace. This profile contains a property called PixelsAspectRatio associated with the same namespace. However due to the open world assumption, we cannot assume that someone else, somewhere on the Internet, has not created a vocabulary using the same namespace that uses PixelsAspectRatio. Therefore this profile may be valid”.
Although we agree there are situations where the open world assumption is helpful, for example when combining information from different sources, we do not believe it is compatible with validation. As UAProf clearly demonstrates, there are times when validation is essential to ensure data quality in distributed systems, particularly when the data is typically being created by one organization and being consumed by another. This is a difficult problem, even with validation [METACRAP].
Although it is possible to use the information in RDF Schemas and OWL ontologies to do validation [VALIDATION], this means breaking the open world assumption and using the schemas or ontologies in a very application specific way. In addition it is not possible to express all the constraints that you typically want to express for validation purposes in these languages [VALID-RDF], [CHECKRDF]. We note that the Semantic Web Best Practice (SWBP) WG is considering some aspects of validation [VALUE-SETS] but we think in order for the semantic web to be a success, there needs to be a generic, non-application specific approach to validation. This work is most appropriately done within the Semantic Web activity.
The problem with the schemas highlights a common problem with UAProf: when people are publishing information on the web, there needs to be a process that everyone follows so that errors are corrected. Often errors in profiles or schemas are discovered by people in different organisations to those who created the data, as the creators may not themselves users of the data. Consequently there may be no compelling need for the creators to fix the data so it is left broken. Worse still, with schemas, people may decide the publication process of a standards body actually prohibits them from making changes meaning schemas are left broken.
There are several ways of addressing this problem: one way might be to have a registry for profiles and schemas and some kind of QA governance process. Another way might be for schemas and profiles need to contain an email address so that it is possible to identify who needs to be contacted if there is an error with the resource. Regardless of the process, all developers who potentially create web content for these devices need to be able to participate. The process may possibly require arbitration from a third party, in order to determine whether the requestor has identified a valid error in the profile or schema that needs correcting. The process also needs to ensure the resource is corrected in a timely manner. Having a procedure that everyone conforms to is essential in order to ensure data accuracy.
Another problem impeding profile correctness is although the schemas contain example values for properties, they do not contain a vocabulary of correct values for properties. This means different manufacturers, or sometimes even different people at the same manufacturer, encode the same state in different ways. For example one profile might encode a SecuritySupport value as a Bag containing “WTLS-1”, “WTLS-2”, “WTLS-3” and “signText” [T610] whereas another might use “WTLS class 1/2/3/signText” [T68] and another might use “WTLS Class 2” . These different property encodings means it is hard for an application to use the values from these profiles.
One way of avoiding some these problems is to have a controlled vocabulary for property values i.e. we say that only “WTLS-1”, “WTLS-2”, “WTLS-3” and “signText” are allowable values, excluding the other variants. However constraining property values in this way in RDFS or OWL, although a common problem, is not a trivial matter: for possible solutions in OWL see [VALUE-SETS].
In other cases it is not possible to use vocabularies of property values: for example there have been cases where one manufacturer encodes the ScreenSize property as the total area of a screen, whereas another encodes the total renderable area, excluding the area of the screen taken up by status information. These are clearly different measurements. The only way to solve this problem is tightly define the property in a human readable way so that different manufacturers provide the same measurement, or perhaps use some kind of conformance testing procedure on handsets.
Once the SWBP WG has agreed a way to solve some of these issues, standards like UAProf need to define sets of property values where appropriate, and then use the approach outlined by the SWBP WG to add this information to schemas. Existing validation tools need to be adapted so they can process this information when validating profiles.
One important difference between UAProf 1.1 and UAProf 2 is it mandates the use of RDF datatyping in the profile. For example in UAProf 1.1 the OutputCharSet property would be written like this:
whereas in UAProf 2 it would be written like this:
<rdf:li rdf:datatype=" http://www.openmobilealliance.org/tech/profiles/UAPROF/xmlschema-20030226#Literal">US-ASCII</rdf:li>
<rdf:li rdf:datatype=" http://www.openmobilealliance.org/tech/profiles/UAPROF/xmlschema-20030226#Literal">UTF-8</rdf:li>
<rdf:li rdf:datatype=" http://www.openmobilealliance.org/tech/profiles/UAPROF/xmlschema-20030226#Literal">ISO-10646-UCS-2</rdf:li>
By using entity definitions it is possible to abbreviate this to some extent e.g.
although this relies on prf-dt being defined appropriately in a DTD section.
This change arose because the OMA wanted UAProf to be close to RDF, so when RDF was revised to use this approach for datatyping, UAProf was revised accordingly. However we think this has had a negative impact on the ease of authoring UAProf profiles. At the time of writing, we have not yet encountered a UAProf 2 profile that uses datatyping correctly. We also note that not all CC/PP and UAProf processors may be compatible with UAProf 2 profiles. DELI supports both UAProf 1.1 and UAProf 2 and can validate profiles that use RDF data typing.
These problems occur because UAProf device descriptions are generally written in text editors. When RDF/XML was created, there was an expectation that RDF/XML would be created by tools, so users would not have to deal with the RDF/XML syntax directly. However although a few tools exist for authoring RDF/XML such as Protege [PROTEGE], they do not seem directly applicable to device description.
There are a number of approaches to simplify this problem: first authors can use ENTITY definitions to simplify the use of namespaces in XML attributes as shown in the example above. Second it is possible to create RDF/XML from XML via XSLT to hide some of the complexities of RDF/XML from profile authors. Third profile authors could use more concise and human readable syntaxes for RDF such as N3 or Turtle [TURTLE] and then automatically convert the profiles to RDF/XML. These approaches are well known in the semantic web community, but are rarely used in the device description community. A document that demonstrated these approaches could simplify the task of profile creation for profile authors.
A tool that automated the migration of UAProf 1.1 profiles to UAProf 2 profiles by extracting data typing information from schemas and automatically adding data type information to profiles would also be helpful, as data type information is largely redundant and adding it by hand is potentially error prone.
When you use software, there is normally a license that describes your rights. This is important because just having access to a copy of software is not sufficient to give rights to be able to use. Therefore one problematic issue with UAProf is phone vendors often assert copyright over the profile, without giving any license. This means it is not clear what the user is entitled to do with the data, for example can they use it, can they change it internally, or can they redistribute it? For example typically some profiles contain a statement similiar to the following:
Copyright (c) 2004 XXXX. All rights reserved. The contents of this document are provided "as is". No warranties of any kind, either express or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose, are made in relation to the accuracy, reliability or contents of this document. XXXX reserves the right to revise this document or withdraw it at any time without prior notice.
We believe that currently the licensing of UAProf profiles is uncertain due to the use of these copyright statements. This situation should be clarified and profiles should ideally be licensed under some kind of open source license.
In UAProf, when a server receive a HTTP request, the UAProf processor performs "resolution" i.e. it combines the profiles received in a particular way. UAProf properties are assigned specific resolution rules: locked means take the first value encountered, override means take the last value encountered and append means append all the values together.
These rules do not work in all situations: for example CcppAccept-Language (preferred human readable language) is an Append. Consider a situation where a device says it prefers English in the reference profile created by the phone manufacturer, and the device supports dynamic profiles for user preferences called profile-diffs (this is admittedly unusual as we have only encountered one device that has tried to support profile-diffs). Then if the user specifies a preference of German, they will still get English content, because the German preference gets appended after the English preference. By combining defaults and resolution rules it is possible to overcome this, by placing the preference for English in a default block, but this is rarely done.
People have suggested to the UAProf group that there needs to be more resolution rules to cope with these problems. In the UAProf vocabulary, there are several cases where it is not clear whether the correct resolution rules were assigned to certain properties. Therefore we think it would be better to get rid of resolution rules, and leave it up to the server or content author to decide how it wishes to perform resolution, as it is their responsibility to present the device with the correct content.
Another question for the DDWG is how should people extend vocabularies or create new vocabularies? This is made easy by the open world assumption in RDF as they just add new properties to a profile. However, as noted before, this open world assumption is not compatible with the need for validation. Therefore in order to reconcile these two needs, we think that vocabulary authors or profile authors should always choose a namespace for their extension that has a URL where they could publish an RDFS or an OWL schema and publish the schema from this URL. Then when a validator or processor encounters a profile that uses a non standard schema, it can automatically load the schema and continue with its task. For example we suspect that Vodafone has been asking handset manufacturers to extend UAProf, because we have encountered profiles with a Vodafone namespace [VODAFONE-EXT]:
However Vodafone have not published a schema there, so standard validation techniques fail for these profiles. If they had published a schema, applications like DELI would be able to automatically retrieve schemas referenced in a profile and automatically validate profiles that use these vocabulary extensions. In the future, when editing tools are available that can consume schemas, they will be able to reuse the schemas in a similar way.
In the past, some people have said that it is optional to publish schemas from the same URL as used by the vocabulary namespace. We disagree with this and note in the past there has been considerable confusion when a specification says to use a particular namespace but the schema is published elsewhere, as this results in both URLs being used in profiles.
We expect the Java JSR-188 [JSR-188] API for processing CC/PP and UAProf will be replaced by SPARQL [SPARQL], as this is a general query language for RDF. SPARQL will make it much easier to deal with device profile information in RDF, whether it is legacy CC/PP or UAProf, or CPC, and unlike JSR-188, it is not tied to the arbitary CC/PP structure, or a particular host language.
Apart from UAProf, there is one other widely used approach for device description called WURFL [WURFL]. We think it is instructive to compare and contrast WURFL and UAProf. WURFL, unlike UAProf, is an open source device description file written in XML. As it is open source, it solves the problem of allowing users can correct data. Because it is XML there are more tools available for editing and processing WURFL, and developers find it easier because they are more familiar with XML than RDF/XML. WURFL uses the concept of inheritance which is very useful in this domain because devices often evolve from other devices so share many of their characteristics.
It also important to consider the advantages UAProf has over WURFL: UAProf is based on RDF so has a clear data model inherited from RDF. UAProf, unlike WURFL, supports dynamic discovery of device profiles and dynamic updating of device profiles although only the device manufacturers can do this. This means that there is generally a delay between devices being introduced and them being included in the WURFL file, and developers will need to update their WURFL file to get the latest fixes and devices, whereas with UAProf this happens automatically.
In our opinion neither UAProf nor WURFL has adequately formalized how to describe devices, specifically property values, which makes it hard to validate profiles, although WURFL, unlike UAProf, tries to solve some of the property value problems by using booleans rather than strings wherever possible. WURFL already suffers from problems similar to UAProf [WURFL-USAGE] although to a lesser extent. Currently the number of WURFL developers is small, so data quality is better than UAProf without validation because the data is extracted from UAProf and then reviewed by a small number of knowledgeable developers who identify errors by hand [UAPROF-FUN]. We believe this approach is not scalable: if the number of authors increased, for example if it was adopted by mobile vendors, it would suffer from very similar problems to UAProf due to the lack of formalization.
We have already discussed procedures to ensure that profiles or schemas with errors in can be corrected. However this may not be sufficient: the WURFL creators note that often web developers need information about what functions do not work on particular devices, but the manufacturers of those devices are very unlikely to put this information into a profile. WURFL, because it comes from a third party, can contain this kind of information. Therefore it seems like there is a need to discover information about a particular device, some of which may come from the manufacturer, the rest of which may come from other sources. Being able to use both sources, and discover them dynamically is highly advantageous.
Already companies or network operators are combining sources in this way, but this involves reviewing the different data sources, combining them, cleaning them, and perhaps testing the accuracy of the information.
We note that this type of automatic discovery of information from heterogeneous sources, not all of which are trusted, in order to create a uniform information source is exactly the type of scenario that it has been proposed can be addressed by the semantic web. We think there needs to be some kind of protocol or architecture such that given an identifier for a device, we can retrieving information about that device from a number of sources, rather than just from the manufacturer.
We also not this is a generic problem facing a other groups, so we propose the DDWG is not the most appropriate group to propose a solution. For example Life Sciences have very similar use cases where a user wants to obtain information about a particular gene or protein from a number of different data sources. The Life Science community has been working on LSIDs [LSID] to solve similar problems, although they have received some criticism from some people at the W3C who see LSIDs as unnecessary competitors for URIs [LIFE-SCI].
This problem may be solvable using standards such as SPARQL, but we think it is highly desirable a common approach is used, regardless of whether the application domain is device description or life sciences.
In conclusion, we refer the reader back to the executive summary, which summarise the proposals in this document.
[BUTLER] Mark Butler, HP Labs Personal Web Page http://www.hpl.hp.com/people/marbut
[GD-67] Panasonic GD67 UAProf profile, http://mobileinternet.panasonicbox.com/UAprof/GD67/R2.xml
[ISSUE-155] Issue 155, CC/PP Disposition of comments,
[ISSUE-159] Issue 159, CC/PP Disposition of comments,
[ISSUE-161] Issue 161, CC/PP Disposition of comments,
[ISSUE-162] Issue 162, CC/PP Disposition of comments,
[ISSUE-170] Issue 170, CC/PP Disposition of comments,
[ISSUE-203] Issue 203, CC/PP Disposition of comments,
[LIFE-SCI] Tim Berners-Lee, Presentation from Semantic Web for Lifesciences workshop http://www.w3.org/2004/Talks/1016-sweb-lifesci-tbl/slide28-0.html
[MMI-INK] Dicussion between W3C DI-WG and W3C MMI Ink Working Groups http://lists.w3.org/Archives/Member/w3c-di-wg/2005Apr/0062.html
[T68] Sony Ericsson T68 UAProf profile http://wap.sonyericssonmobile.com/UAprof/T68R501.xml
[TWO-TOWERS] Ian Horrocks, Bijan Parsia, Peter Patel-Schneider, and James Hendler, Semantic Web Architecture: Stack or Two Towers? (see page 3 for a similar, non UAProf example) http://www.cs.man.ac.uk/~horrocks/Publications/download/2005/HPPH05.pdf
[UAPROF 1.1] UAProf 1.1 Specification and Schema, Open Mobile Alliance, http://www.openmobilealliance.org/release_program/uap_v11.html
[UAPROF 2] UAProf 2 Specification and Schema, Open Mobile Alliance, http://www.openmobilealliance.org/release_program/uap_v20.html
[UAPROF-2000] UAProf 2000/04/05 RDF Schema, Open Mobile Alliance, http://www.wapforum.org/profiles/UAPROF/ccppschema-20000405
[UAPROF-FUN] Andrea Trasatti’s blog entry about converting UAProf profiles to WURFL,
[VALIDATION] Charles Smith, Mark Butler, Validating CC/PP and UAProf Profiles, http://www.hpl.hp.com/techreports/2002/HPL-2002-268.html
[VALUE-SETS] Alan Rector, Representing Specified Values in OWL: "value partitions" and "value sets" http://www.w3.org/2001/sw/BestPractices/OEP/SpecifiedValues-20050405/
[VODAFONE-EXT] Example of a UAProf profile with additional Vodafone specific information http://wap.sonyericsson.com/UAprof/Vodafone_802SER101.xml
[WURFL-USAGE] Robert Jones, Creating Web Content for Mobile Phones,