11.4 Protecting Data on the Web

The Web isn't a secure environment. The open nature of the networking and the web protocols TCP, IP, and HTTP has allowed the development of many tools that can listen in on data transmitted between web browsers and servers. It is possible to snoop on passing traffic and read the contents of HTTP requests and responses. With a little extra effort, a hacker can manipulate traffic and even masquerade as another user.

If an application transmits sensitive information over the Web, an encrypted connection should be provided between the web browser and server. For example, an encrypted connection is warranted when:

  • Sensitive information is held on the server such as commercial-in-confidence documents or bank account balances.

  • User credentials are used to gain access to sensitive services such as online banking or the administration of an application.

  • Personal details are collected from the user, such as credit card numbers.

  • Session IDs are used by the server to link HTTP requests to session variables, and the session needs to be secure from hijacking.

Even if none of these of reasons apply to your application, sometimes it's a good idea to use encryption anyway for a commercial application. Bad publicity from a security breach can be equally bad when private or public data is compromised.

In this section, we focus on encrypting data sent over the Web using the Secure Sockets Layer. We discuss the basic mechanics of SSL in this section. An installation and configuration guide for SSL and the Apache web server for Unix and Mac OS X platforms is part of Appendix A through Appendix C. It's possible to set up a secure web server under Microsoft Windows, but we don't cover it in this book.

This section isn't designed to completely cover the topic of encryption. We limit our brief discussion to the features of SSL, and how SSL can protect web traffic. More details about cryptographic systems can be found in the references listed in Appendix G.

11.4.1 The Secure Sockets Layer Protocol

The data sent between web servers and browsers can be protected using the encryption services of the Secure Sockets Layer protocol, SSL. The SSL protocol addresses three goals:

Privacy or confidentiality

The content of a message transmitted over the Internet is protected from observers.


The contents of a message received are correct and have not been tampered with.


Both the sender and receiver of a message can be sure of each other's identity.

SSL was originally developed by Netscape, and there are two versions: SSL v2.0 and SSL v3.0. We don't detail the differences here, but Version 3.0 supports more security features than 2.0. The SSL protocol isn't a standard as such, and the Internet Engineering Task Force (IETF) has proposed the Transport Layer Security 1.0 (TLS) protocol as an SSL v3.0 replacement; at the time of writing SSL v3.0 and TLS are almost the same. See http://ietf.org/rfc/rfc2246.txt?number=2246 for more information on TLS. SSL architecture

To understand how SSL works, you need to understand how browsers and web servers send and receive HTTP messages.

Browsers send HTTP requests by calling on the host systems' TCP/IP networking software, which does the work of sending and receiving data over the Internet. When a request is to be sent (for example, when a user clicks on a hypertext link) the browser formulates the HTTP request and uses the host's TCP/IP network service to send the request to the server. TCP/IP doesn't care that the message is HTTP; it is responsible only for getting the complete message to the destination. When a web server receives a message, data is read from its host's TCP/IP service and then interpreted as HTTP. We discuss the relationship between HTTP and TCP/IP in more detail in Appendix D.

As shown in Figure 11-3, the SSL protocol operates as a layer between the browser and the TCP/IP services provided by the host. A browser passes the HTTP message to the SSL layer to be encrypted before the message is passed to the host's TCP/IP service. The SSL layer, configured into the web server, decrypts the message from the TCP/IP service and then passes it to the web server. Once SSL is installed and the web server is configured correctly, the HTTP requests and responses are automatically encrypted. PHP scripting is not required to use the SSL services.

Figure 11-3. HTTP clients and servers, SSL, and the network layer that implements TCP/IP

Because SSL sits between HTTP and TCP/IP, secure web sites technically don't serve HTTP, at least not directly over TCP. URLs that locate resources on a secure server begin with https://, which means HTTP over SSL. The default port for an SSL service is 443, not port 80 as with HTTP; for example, when a browser connects to https://secure.example.com, it makes a TCP/IP connection to port 443 on secure.example.com. Most browsers and web servers can support SSL, but keys and certificates need to be included in the configuration of the server (and possibly the browser, if client certification is required). In addition, web browsers need to be preconfigured with certificates from root CAs; fortunately, all browsers come with these. We discuss CAs and certificates later. Cipher suites

To provide a service that addresses the goals of privacy, integrity, and authentication, SSL uses a combination of cryptographic techniques. These include message digests, digital certificates, and, of course, encryption. There are many different standard algorithms that implement these functions, and SSL can use different combinations to meet particular requirements (such as the legality of using a technique in a particular country!).

When an SSL connection is established, clients and servers negotiate the best combination of techniques?based on common capabilities?to ensure the highest level of protection. The combinations of techniques that can be negotiated are known as cipher suites. SSL sessions

When a browser connects to a secure site, the SSL protocol performs the following four steps:

  1. A cipher suite is negotiated. The browser and the server identify the major SSL version supported, and then the configured capabilities. The strongest cipher suit that can be supported by both systems is chosen.

  2. A secret key is shared between the server and the browser. Normally the browser generates a secret key that is one-way (asymmetrically) encrypted using the server's public key. Only the server can learn the secret by decrypting it with the corresponding private key. The shared secret is used as the key to encrypt and decrypt the HTTP messages that are transmitted. This phase is called the key exchange.

  3. The browser authenticates the server by examining the server's X.509 digital certificate. Often browsers are preloaded with a list of certificates from Certification Authorities, and authentication of the server is transparent to a user. If the browser doesn't know about the certificate, the user is warned, usually by a dialog box that pops up and asks whether the user wants to proceed in the face of failed authentication.

  4. The server examines the browser's X.509 certificate to authenticate the client. This step is optional and requires that each client be set up with a signed digital certificate. Apache can be configured to use fields from the browser's X.509 certificate as if they were the username and password encoded into an HTTP Authorization header field. Client certificates aren't commonly used on the Web.

These four steps briefly summarize the network handshaking between the browser and server when SSL is used. Once the browser and server have completed these steps, the HTTP request can be encrypted by SSL and sent to the web server.

The SSL handshaking is slow, and if this was to occur with every HTTP request, the performance of a secure web site would be poor. To improve performance, SSL uses the concept of sessions to allow multiple requests to share the negotiated cipher suite, the shared secret key, and the certificates. An SSL session is managed by the SSL software and isn't the same as a PHP session. Certificates and certification authorities

A signed digital certificate encodes information so that the integrity of the information and its signature can be tested. The information contained in a certificate used by SSL includes details about the organization and the organization's public key. The public key that is contained in a certificate is paired with a secret private key that is configured into the organization's web server; if you've followed our setup instructions for Unix or Mac OS X platforms in Appendix A through Appendix C, you'll recall generating the pair of keys and adding the private key to the web server.

The browser uses the public key when an SSL session is established to encrypt a secret. The secret can be decrypted only using the private key configured into the organization's server. Encryption techniques that use a public and private key are known as one-way or asymmetric, and SSL uses asymmetric encryption to exchange a secret key. The secret key can then be used to encrypt the messages transmitted over the Internet.

You cannot, of course, trust an unknown server to be what it claims to be; you have to depend on a known authority to validate that the server is telling the truth and you have to trust that authority. That is the role of a Certification Authority (CA). Each signed certificate contains details about the CA. The CA digitally signs a certificate by adding its own organization details, an encrypted digest of the certificate (created using a technique such as MD5), and its own public key. With this information encoded, the complete signed certificate can be verified as being correct.

There are dozens, perhaps hundreds, of CAs. A browser (or the user confronted by a browser warning) can't be expected to recognize the digital signatures from all these authorities. The X.509 certificate standard solves this problem by allowing issuing CAs to have their signatures digitally signed by a more authoritative CA, who can in turn have its signature signed by yet another, more trusted CA. Eventually the chain of signatures ends with that of a root Certification Authority. As discussed previously, the certificates from the root CAs are usually pre-installed with browser software. In addition, most browsers allow users to add their own trusted certificates.

If you don't want to pay for a certificate or you need one for testing, free certificates can be created and used to configure a web server with SSL. We show how to create free self-signed certificates for Unix and Mac OS X platforms in Appendix A through Appendix C, or you can obtain a free trial certificate for any platform from VeriSign at http://www.verisign.com/. However, self-signed or trial certificates are normally useful only in restricted environments such as corporate networks. They won't be trusted by users of secure applications on the Internet, and you'll probably need to pay to have yours signed before the application is actually deployed.