A data schema is a description of a set of data. P3P includes a way to describe data schemas so that services can communicate to user agents about the data they collect. A data schema is built from a number of data elements, which are specific items of data a service might collect.
Data elements in a data schema can have the following properties:
Data element name. The name of the data element is used when a P3P policy includes this data element in a <DATA> element. This is required on all data elements.
Descriptive name or short name. A data element's short name provides a short, human-understandable name for the data element. The short name is not required, but it is strongly recommended.
Long description. The long description of a data element provides a more detailed, human-understandable definition of the data element. Like the short name, the long description is not required, but it is strongly recommended.
Category or categories. Most data elements have categories assigned to them when they are defined in a data schema. See Categories for more information on categories.
Data elements are organized into a hierarchy. A data element automatically includes all of the data elements below it in the hierarchy. For example, the data element representing "the user's name" includes the data elements representing "the user's given name," "the user's family name," and so on. The hierarchy is based on the data element name. Thus the data elements user.name.given, user.name.family, and user.name.nickname are all children of the data element user.name, which is in turn a child of the data element user.
P3P has defined a data schema called the P3P base data schema that includes a large number of data elements commonly used by services.
Services may declare new data elements by creating and publishing their own data schemas. This is done with the <DATASCHEMA> element. These can either be published in standalone XML files, which are then referenced by policies that use them, or they can be embedded in the policies files that reference them. The <DATASCHEMA> element is defined as follows:
|
A standalone data schema has the <DATASCHEMA> element as the first XML element in the file. It must have the appropriate namespace defined in the xmlns attribute to identify it as a P3P data schema, as follows:
<DATASCHEMA xmlns="http://www.w3.org/2001/09/P3Pv1"> <DATA-STRUCT ... /> ... <DATA-DEF ... /> </DATASCHEMA>
When a data schema is declared inside a policy file, then the <DATASCHEMA> element is still used (as described in Section 3.2.1, "The <POLICIES> Element"), but no namespace attribute is given.
Data schemas contain a number of fields in natural language. Services publishing a data schema MAY wish to translate these fields into multiple languages. The data element short and long names MAY be translated, but the data element name MUST NOT be translated?this field needs to stay constant across translations of a data schema.
If a service is going to provide a data schema in multiple natural languages, then it SHOULD examine the Accept-Language HTTP request-header on requests for that data schema to pick the best available alternative.
Data schemas often need to reuse a common group of data elements. P3P data schemas support this through data structures. A data structure is a named, abstract definition of a group of data elements. When a data element is defined, it can be defined as being of an unstructured type, in which case it has no child elements. The data element can also be defined as being of a specific structured type, in which case the data element will be automatically expanded to include as sub-elements all of the elements defined in the data structure. For example, the following structure is used to represent a date and time:
<!-- "date" Data Structure --> <DATA-STRUCT name="date.ymd.year" short-description="Year"/> <DATA-STRUCT name="date.ymd.month" short-description="Month"/> <DATA-STRUCT name="date.ymd.day" short-description="Day"/> <DATA-STRUCT name="date.hms.hour" short-description="Hour"/> <DATA-STRUCT name="date.hms.minute" short-description="Minute"/> <DATA-STRUCT name="date.hms.second" short-description="Second"/>
Now we shall define a "meeting" data element, which has a time and place for the meeting:
<DATA-DEF name="meeting.time" short-description="Meeting time" structref="#date"/> <DATA-DEF name="meeting.place" short-description="Meeting place/>
Since meeting.place does not reference a structure, it is of an unstructured type, and has no child elements. The meeting.time element uses the date structure. By declaring this, the following sub-elements are created:
meeting.time.ymd.year meeting.time.ymd.month meeting.time.ymd.day meeting.time.hms.hour meeting.time.hms.minute meeting.time.hms.second
A P3P policy can now declare that it collects the meeting data element, which implies that it collects all of the sub-elements of meeting, or it can use data elements lower down the hierarchy?meeting.time, for example, or meeting.time.ymd.day.
<DATA-DEF> and <DATA-STRUCT>
Define a data element or a data structure, respectively. Data structures are reusable structured type definitions that can be used to build data elements. Data elements are declared within a <STATEMENT> in a P3P policy to describe data covered by that statement.
The following attributes are common to these two elements:
name (mandatory attribute)
Indicates the name of the data element or data structure. Remember that names of data element and data structures are case-sensitive, so, for example, user.gender is different from USER.GENDER or User.Gender. Furthermore, in names of data elements and structures no number character can appear immediately following a dot.
structref
URI reference ([URI]), where the fragment identifier part denotes the structure, and the URI part denotes the corresponding data schema where it is defined. The default base URI is a same-document reference ([URI]). Data elements or data structures without a structref attribute (and, so, without an associated structure) are called unstructured.
short-description
A string denoting the short display name of the data element or structure, no more than 255 characters.
The DATA-DEF and DATA-STRUCT elements can also contain a long description of the data element or structure, using the LONG-DESCRIPTION element.
Here, URI-reference is defined as in [URI]. |
Data elements can be structured, much like in common programming languages: structures are hierarchical (tree-like) descriptions of data elements: this hierarchical description is performed in the name attribute using a full stop (".") character as separator.
P3P provides the P3P base data schema, which has built-in definitions of a number of widely used structures and data elements. All P3P implementations are required to understand the P3P base data schema, so the structures and elements it defines are always available to P3P implementers.
Categories can be assigned to data structures or data elements. The following rules define how those category definitions are meant to be used:
<DATA-STRUCT> elements MAY include category definitions with them. If a structure definition includes categories, then all uses of those structures in data definitions and data structures pick up those categories. If a structure contains no categories, then the categories for that structure MAY be defined when it is used in another structure or data element. Otherwise, a data element using this structure is a variable-category element. Any uses of a variable-category data element in a policy require that its categories be listed in the policy.
A <DATA-DEF> with an unstructured type is a variable-category data element if no categories are defined in the <DATA-DEF>, and has exactly those categories listed in the <DATA-DEF> if any categories are included.
A <DATA-DEF> or <DATA-STRUCT> with a structured type which has no categories defined on that structure produces a variable-category data element/structure if no categories are defined in the <DATA-DEF> or <DATA-STRUCT>. If the <DATA-DEF> or <DATA-STRUCT> does have categories listed, then those categories are applied to that data element, and all of its sub-elements. In other words, categories are pushed down into sub-elements when defining a data element to be of a structured type, and the structured type does not define any categories.
A <DATA-DEF> using a structured type which has categories defined on that structure picks up all the categories listed on the structure. In addition, categories may be listed in the <DATA-DEF>, and these are added to the categories defined in the structure. These categories are defined only at the level of that data element, and are not "pushed down" to any sub-elements.
A <DATA-STRUCT> that has no categories assigned to it, and which is using a structured subtype which has categories defined on the subtype picks up all the categories listed on the subtype.
A <DATA-STRUCT> that has categories assigned to it, and which is using a structured subtype replaces all of the categories listed on the subtype.
There is a "bubble-up" rule for categories when referencing data elements: data elements, must at a minimum, include all categories defined by any of its children. This rule applies recursively, so for example, all categories defined by data elements foo.a.w, foo.a.y, and foo.b.z MUST be considered to apply to data element foo.
A <DATA-STRUCT> cannot be defined with some variable-category elements and some fixed-category elements. Either all of the sub-elements of a category must be in the variable category, or else all of them must have one or more assigned categories.
Consider the case where the company HyperSpeedExample wishes to describe the features of a vehicle, using a structure called vehicle. This structure includes:
The vehicle's model type (vehicle.model),
The vehicle's color (vehicle.color),
The vehicle's year of manufacture (vehicle.built.year), and
The vehicle's price (vehicle.price).
If HyperSpeedExample also wants to include in the definition of a vehicle the location of manufacture, it could add other fields to the structure with all the relevant data like country, street address, postal code, and so on. But, each part of a structure can use other structures as well: structures can be composed. In this case, the P3P base data schema already provides a structure postal, describing all the postal information of a location. So, the final definition of the structure vehicle is
vehicle.model (unstructured)
vehicle.color (unstructured)
vehicle.price (unstructured)
vehicle.built.year (unstructured)
vehicle.built.where (with structure postal from the base data schema)
The structure postal has fields postal.street, postal.city, and so on. Since we have applied the structure postal to vehicle.built.where, it means that we can access the street and city of a vehicle using the descriptions vehicle.built.where.street and vehicle.built.where.city respectively. So, by applying a structure (in this case, postal) we can build very complex descriptions in a modular way.
HyperSpeedExample wants to declare that all of the vehicle information will be in the <preference/> category. The vehicle.model, vehicle.color, vehicle.price, and vehicle.built.year fields are all unstructured types, so assigning them to the <preference/> category accomplishes this for those fields. Since vehicle is a structure definition, assigning the <preference/> category to vehicle.built.where will override (replace) the categories defined on all of the sub-elements of vehicle.built.where, placing all of them in the <preference/> category, even though the postal structure was originally defined as being in other categories.
As said, structures do not contain data elements; they are just abstract data types. We can use them to rapidly build structured collections of data elements. Going on with the example, HyperSpeedExample needs this abstract description of the features of a vehicle because it wants to actually exchange data about cars and motorcycles. So, it could define two data elements called car and motorcycle, both with the above structure vehicle.
This description of the data elements and data structures is encoded in XML using a data schema. In the HyperSpeedExample case, it would be something like:
<DATASCHEMA xmlns="http://www.w3.org/2001/09/P3Pv1"> <DATA-STRUCT name="vehicle.model" short-description="Model"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.color" short-description="Color"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.year" short-description="Construction Year"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.where" structref="http://www.w3.org/TR/P3P/base#postal" short-description="Construction Place"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-DEF name="car" structref="#vehicle"/> <DATA-DEF name="motorcycle" structref="#vehicle"/> </DATASCHEMA>
Continuing with the example, in order to reference a car model and construction year, Hyperspeed or any other service could send the following references inside a P3P policy:
<DATA-GROUP> <!-- First, the "car.model" data element, whose definition is in the data schema at http://www.HyperSpeed.example.com/models-schema --> <DATA ref="http://www.HyperSpeed.example.com/models-schema#car.model"/> <!-- And second, the "car.built.year" data element, whose definition is the data schema at http://www.HyperSpeed.example.com/ models-schema --> <DATA ref="http://www.HyperSpeed.example.com/ models-schema#car.built.year"/> </DATA-GROUP>
Using the base attribute, the above references can be written in an even more compact way:
<DATA-GROUP base="http://www.HyperSpeed.example.com/models-schema"> <DATA ref="#car.model"/> <DATA ref="#car.built.year"/> </DATA-GROUP>
Alternatively, the data schema could be embedded directly into a policy file. In this case, the policy file could look like:
<POLICIES xmlns="http://www.w3.org/2001/09/P3Pv1"> <!-- Embedded data schema --> <DATASCHEMA> <DATA-STRUCT name="vehicle.model" short-description="Model"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.color" short-description="Color"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.year" short-description="Construction Year""> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-STRUCT name="vehicle.built.where" structref="http://www.w3.org/TR/P3P/base#postal" short-description="Construction Place"> <CATEGORIES><preference/></CATEGORIES> </DATA-STRUCT> <DATA-DEF name="car" structref="#vehicle"/> <DATA-DEF name="motorcycle" structref="#vehicle"/> </DATASCHEMA> <!-- end of embedded data schema --> <POLICY name="policy1" discuri="http://www.example.com/disc1"> ... <DATA-GROUP base=""> <DATA ref="#car.model"/> <DATA ref="#car.built.year"/> </DATA-GROUP> ... </POLICY> <POLICY name="policy2" discuri="http://www.example.com/disc2"> .... </ POLICY> <POLICY name="policy3" discuri="http://www.example.com/disc3"> .... </ POLICY> </POLICIES>
Note that in any case there MUST NOT be more than one data schema per file.
Note that the data element names specified in the base data schema or in extension data schemas may be used for purposes other than P3P policies. For example, Web sites may use these names to label HTML form fields. By referring to data the same way in P3P policies and forms, automated form-filling tools can be better integrated with P3P user agents.
An essential requirement on data schemas is the persistence of data schemas: data schemas that can be fetched at a certain URI can only be changed by extending the data schema in a backward-compatible way (that is to say, changing the data schema does not change the meaning of any policy using that schema). This way, the URI of a policy acts in a sense like a unique identifier for the data elements and structures contained therein: any data schema that is not backward-compatible must therefore use a new different URI.
Note that a useful application of the persistence of data schema is given for example in the case of multi-lingual sites: multiple language versions (translations) of the same data schema can be offered by the server, using the HTTP "Content-Language" response header field to properly indicate that a particular language has been used for the data schema.
The Basic Data Structures are structures used by the P3P base data schema (and possibly, due to their basic nature, they should be reused as much as possible by other different data schemas). All P3P-compliant user agent implementations MUST be aware of the Basic Data Structures. Each table below specifies the elements of a basic data structure, the categories associated, their structures, and the display names shown to users. More than one category may be associated with a fixed data element. However, each base data element is assigned to only one category whenever possible. Data schema designers are recommended to do the same.
The date structure specifies a date. Since date information can be used in different ways, depending on the context, all date information is tagged as being of "variable" category (see Section 5.7.2). For example, schema definitions can explicitly set the corresponding category in the element referencing this data structure, where soliciting the birthday of a user might be "Demographic and Socioeconomic Data," while the expiration date of a credit card might belong to the "Purchase Information" category.
date |
Category |
Structure |
Short Display Name |
---|---|---|---|
ymd.year |
(variable-category) |
unstructured |
Year |
ymd.month |
(variable-category) |
unstructured |
Month |
ymd.day |
(variable-category) |
unstructured |
Day |
hms.hour |
(variable-category) |
unstructured |
Hour |
hms.minute |
(variable-category) |
unstructured |
Minute |
hms.second |
(variable-category) |
unstructured |
Second |
fractionsecond |
(variable-category) |
unstructured |
Fraction of Second |
timezone |
(variable-category) |
unstructured |
Time Zone |
The "time zone" information is for example described in the time standard [ISO8601]. Note that "date.ymd" and "date.hms" can be used to fast reference the year/month/day and hour/minutes/seconds blocks respectively.
The personname structure specifies information about the naming of a person.
personname |
Category |
Structure |
Short Display Name |
---|---|---|---|
prefix |
Demographic and Socioeconomic Data |
unstructured |
Name Prefix |
given |
Physical Contact Information |
unstructured |
Given Name (First Name) |
family |
Physical Contact Information |
unstructured |
Family Name (Last Name) |
middle |
Physical Contact Information |
unstructured |
Middle Name |
suffix |
Demographic and Socioeconomic Data |
unstructured |
Name Suffix |
nickname |
Demographic and Socioeconomic Data |
unstructured |
Nickname |
The login structure specifies information (IDs and passwords) for computer systems and Web sites which require authentication. Note that this data element should not be used for computer systems or Web sites which use digital certificates for authentication: in those cases, the certificate structure should be used.
login |
Category |
Structure |
Short Display Name |
---|---|---|---|
id |
Unique Identifiers |
unstructured |
Login ID |
password |
Unique Identifiers |
unstructured |
Login Password |
The "id" field represents the ID portion of the login information for a computer system. Often, user IDs are made public, while passwords are kept secret. This does not include any type of biometric authentication mechanisms.
The "password" field represents the password portion of the login information for a computer system. This is a secret data value, usually a character string, that is used in authenticating a user. Passwords are typically kept secret, and are generally considered to be sensitive information
The certificate structure is used to specify identity certificates (like, for example, X.509).
certificate |
Category |
Structure |
Short Display Name |
---|---|---|---|
key |
Unique Identifiers |
unstructured |
Certificate Key |
format |
Unique Identifiers |
unstructured |
Certificate Format |
The "format" field is used to represent the information of an IANA registered public key or authentication certificate format, while the "key' field is used to represent the corresponding certificate key.
The telephonenum structure specifies the characteristics of a telephone number.
telephonenum |
Category |
Structure |
Short Display Name |
---|---|---|---|
intcode |
Physical Contact Information |
unstructured |
International Telephone Code |
loccode |
Physical Contact Information |
unstructured |
Local Telephone Area Code |
number |
Physical Contact Information |
unstructured |
Telephone Number |
ext |
Physical Contact Information |
unstructured |
Telephone Extension |
comment |
Physical Contact Information |
unstructured |
Telephone Optional Comments |
The contact structure is used to specify contact information. Services can specify precisely which set of data they need, postal, telecommunication, or online address information.
contact |
Category |
Structure |
Short Display Name |
---|---|---|---|
postal |
Physical Contact Information, Demographic and Socioeconomic Data |
postal |
Postal Address Information |
telecom |
Physical Contact Information |
telecom |
Telecommunications Information |
online |
Online Contact Information |
online |
Online Address Information |
The postal structure specifies a postal mailing address.
postal |
Category |
Structure |
Short Display Name |
---|---|---|---|
name |
Physical Contact Information, Demographic and Socioeconomic Data |
personname |
Name |
street |
Physical Contact Information |
unstructured |
Street Address |
city |
Demographic and Socioeconomic Data |
unstructured |
City |
stateprov |
Demographic and Socioeconomic Data |
unstructured |
State or Province |
postalcode |
Demographic and Socioeconomic Data |
unstructured |
Postal Code |
country |
Demographic and Socioeconomic Data |
unstructured |
Country Name |
organization |
Demographic and Socioeconomic Data |
unstructured |
Organization Name |
The "country" field represents the information of the name of the country (for example, one among the countries listed in [ISO3166]).
The telecom structure specifies telecommunication information about a person.
telecom |
Category |
Structure |
Short Display Name |
---|---|---|---|
telephone |
Physical Contact Information |
telephonenum |
Telephone Number |
fax |
Physical Contact Information |
telephonenum |
Fax Number |
mobile |
Physical Contact Information |
telephonenum |
Mobile Telephone Number |
pager |
Physical Contact Information |
telephonenum |
Pager Number |
The online structure specifies online information about a person.
online |
Category |
Structure |
Short Display Name |
---|---|---|---|
|
Online Contact Information |
unstructured |
Email Address |
uri |
Online Contact Information |
unstructured |
Home Page Address |
Two structures used for representing forms of Internet addresses are provided. The uri structure covers Universal Resource Identifiers (URI), which are defined in more detail in [URI]. The ipaddr structure represents IP addresses and Domain Name System (DNS) hostnames.
uri |
Category |
Structure |
Short Display Name |
---|---|---|---|
authority |
(variable-category) |
unstructured |
URI Authority |
stem |
(variable-category) |
unstructured |
URI Stem |
querystring |
(variable-category) |
unstructured |
Query-string Portion of URI |
The authority of a URI is defined as the authority component in [URI]. The stem of a URI is defined as the information contained in the portion of the URI after the authority and up to (and including) the first "?" character in the URI, and the querystring is the information contained in the portion of the URI after the first "?" character. For URIs which do not contain a "?"character, the stem is the entire URI, and the querystring is empty.
Since URI information can be used in different ways, depending on the context, all the fields in the uri structure are tagged as being of "variable" category. Schema definitions MUST explicitly set the corresponding category in the element referencing this data structure.
The ipaddr structure represents the hostname and IP address of a system.
ipaddr |
Category |
Structure |
Short Display Name |
---|---|---|---|
hostname |
Computer Information |
unstructured |
Complete Host and Domain Name |
partialhostname |
Demographic |
unstructured |
Partial Hostname |
fullip |
Computer Information |
unstructured |
Full IP Address |
partialip |
Demographic |
unstructured |
Partial IP Address |
The hostname element is used to represent collection of either the simple hostname of a system, or the full hostname including domain name. The partialhostname element represents the information of a fully-qualified hostname which has had at least the host portion removed from the hostname. In other words, everything up to the first "." in the fully-qualified hostname MUST be removed for an address to qualify as a "partial hostname."
The fullip element represents the information of a full IP version 4 or IP version 6 address. The partialip element represents an IP version 4 address (only?not a version 6 address) which has had at least the last 7 bits of information removed. This removal MUST be done by replacing those bits with a fixed pattern for all visitors (for example, all 0s or all 1s).
Certain Web sites are known to make use not of the visitor's entire IP address or hostname, but rather make use of a reduced form of that information. By collecting only a subset of the address information, the site visitor is given some measure of anonymity. It is certainly not the intent of this specification to claim that these "stripped" IP addresses or hostnames are impossible to associate with an individual user, but rather that it is significantly more difficult to do so. Sites which perform this data reduction MAY wish to declare this practice in order to more-accurately reflect their practices.
The loginfo structure is used to represent information typically stored in Web-server access logs.
loginfo |
Category |
Structure |
Short Display Name |
---|---|---|---|
uri |
Navigation and click-stream data |
uri |
URI of Requested Resource |
timestamp |
Navigation and click-stream data |
date |
Request Timestamp |
clientip |
Computer Information, Demographic and Socioeconomic Data |
ipaddr |
Client's IP Address or Hostname |
other.httpmethod |
Navigation and click-stream data |
unstructured |
HTTP Request Method |
other.bytes |
Navigation and click-stream data |
unstructured |
Data Bytes in Response |
other.statuscode |
Navigation and click-stream data |
unstructured |
Response Status Code |
The resource in the HTTP request is captured by the uri field. The time at which the server processes the request is represented by the timestamp field. Server implementations are free to define this field as the time the request was received, the time that the server began sending the response, the time that sending the response was complete, or some other convenient representation of the time the request was processed. The IP address of the client system making the request is given by the clientip field.
The other data fields represent other information commonly stored in Web server access logs. other.httpmethod is the HTTP method (such as GET, POST, etc.) in the client's request. other.bytes indicates the number of bytes in the response-body sent by the server. other.statuscode is the HTTP status code on the request, such as 200, 302, or 404 (see section 6.1.1 of [HTTP1.1] for details).
The httpinfo structure represents information carried by the HTTP protocol which is not covered by the loginfo structure.
httpinfo |
Category |
Structure |
Short Display Name |
---|---|---|---|
referer |
Navigation and click-stream data |
uri |
Last URI Requested by the User |
useragent |
Computer Information |
unstructured |
User Agent Information |
The useragent field represents the information in the HTTP User-Agent header (which gives information about the type and version of the user's Web browser), and/or the HTTP accept* headers.
The referer field represents the information in the HTTP Referer header, which gives information about the previous page visited by the user. Note that this field is misspelled in exactly the same way as the corresponding HTTP header.
All P3P-compliant user agent implementations MUST be aware of the data elements in the P3P base data schema. The P3P base data schema includes the definition of the basic data structures, and four data element sets: user, thirdparty, business and dynamic. The user, thirdparty and business sets include elements that users and/or businesses might provide values for, while the dynamic set includes elements that are dynamically generated in the course of a user's browsing session. User agents may support a variety of mechanisms that allow users to provide values for the elements in the user set and store them in a data repository, including mechanisms that support multiple personae. Users may choose not to provide values for these data elements.
The formal XML definition of the P3P base data schema is given in Appendix 3. In the following sections, the base data elements and sets are explained one by one. In the future there will be in all likelihood demand for the creation of other data sets and elements. Obvious applications include catalogue, payment, and agent/system attribute schemas (an extensive set of system elements is provided for example in http://www.w3.org/TR/NOTE-agent-attributes).
Each table below specifies a set, the elements within the set, the category associated with the element, its structure, and the display name shown to users. More than one category may be associated with a fixed data element. However, each base data element is assigned to only one category whenever possible. It is recommended that data schema designers do the same.
The user data set includes general information about the user.