Introduction

Sockets are endpoints for communication. Some types of sockets provide reliable communications. Others offer few guarantees, but consume low system overhead. Socket communication can be used to let processes talk on just one machine or over the Internet.

In this chapter we consider the two most commonly used types of sockets: streams and datagrams. Streams provide a bidirectional, sequenced, and reliable channel of communicationsimilar to pipes. Datagram sockets do not guarantee sequenced, reliable delivery, but they do guarantee that message boundaries will be preserved when read. Your system may support other types of sockets as well; consult your socket(2) manpage or equivalent documentation for details.

We also consider both the Internet and Unix domains. The Internet domain gives sockets two-part names: a host (an IP address in a particular format) and a port number. In the Unix domain, sockets are named using files (e.g., /tmp/mysock).

In addition to domains and types, sockets also have a protocol associated with them. Protocols are not very important to the casual programmer, as there is rarely more than one protocol for a given domain and type of socket.

Domains and types are normally identified by numeric constants (available through functions exported by the Socket and IO::Socket modules). Stream sockets have the type SOCK_STREAM, and datagram sockets have the type SOCK_DGRAM. The Internet domain is PF_INET, and the Unix domain PF_UNIX. (POSIX uses PF_LOCAL instead of PF_UNIX, but PF_UNIX will almost always be an acceptable constant simply because of the preponderance of existing software that uses it.) You should use these symbolic names instead of numbers because the numbers may change (and historically, have).

Protocols have names such as tcp and udp, which correspond to numbers that the operating system uses. The getprotobyname function (built into Perl) returns the number when given a protocol name. Pass protocol number 0 to socket functions to have the system select an appropriate default.

Perl has built-in functions to create and manipulate sockets; these functions largely mimic their C counterparts. While this is good for providing low-level, direct access to every part of the system, most of us prefer something more convenient. That's what the IO::Socket::INET and IO::Socket::UNIX classes are forthey provide a high-level interface to otherwise intricate system calls.

Let's look at the built-in functions first. They all return undef and set $! if an error occurs. The socket function makes a socket, bind gives a socket a local name, connect connects a local socket to a (possibly remote) one, listen readies a socket for connections from other sockets, and accept receives the connections one by one. You can communicate over a stream socket with print and <> as well as with syswrite and sysread, or over a datagram socket with send and recv. (Perl does not currently support sendmsg(2).)

A typical server calls socket, bind, and listen, then loops in a blocking accept call that waits for incoming connections (see Recipe 17.2 and Recipe 17.5). A typical client calls socket and connect (see Recipes Recipe 17.1 and Recipe 17.4). Datagram clients are special. They don't have to connect to send data, because they can specify the destination as an argument to send.

When you bind, connect, or send to a specific destination, you must supply a socket name. An Internet domain socket name is a host (an IP address packed with inet_aton) and a port (a number), packed into a C-style structure with sockaddr_in:

use Socket;

$packed_ip   = inet_aton("208.201.239.37");
$socket_name = sockaddr_in($port, $packed_ip);

A Unix domain socket name is a filename packed into a C structure with sockaddr_un:

use Socket;

$socket_name = sockaddr_un("/tmp/mysock");

To take a packed socket name and turn it back into a filename or host and port, call sockaddr_un or sockaddr_in in list context:

($port, $packed_ip) = sockaddr_in($socket_name);    # for PF_INET sockets
($filename)         = sockaddr_un($socket_name);    # for PF_UNIX sockets

Use inet_ntoa to turn a packed IP address back into an ASCII string. It stands for "numbers to ASCII," and inet_aton stands for "ASCII to numbers."

$ip_address = inet_ntoa($packed_ip);
$packed_ip  = inet_aton("208.201.239.37");
$packed_ip  = inet_aton("www.oreilly.com");

Most recipes use Internet domain sockets in their examples, but nearly everything that applies to the Internet domain also applies to the Unix domain. Recipe 17.6 explains the differences and pitfalls.

Sockets are the basis of network services. We provide three ways to write servers: one where a child process is created for each incoming connection (Recipe 17.11), one where the server forks in advance (Recipe 17.12), and one where the server process doesn't fork at all (Recipe 17.13).

Some servers need to listen to many IP addresses at once, which we demonstrate in Recipe 17.16. Well-behaved servers clean up and restart when they get a HUP signal; Recipe 17.18 shows how to implement that behavior in Perl. We also show how to put a name to both ends of a connection; see Recipe 17.7 and Recipe 17.8.

UNIX Network Programming (Prentice Hall) and the three-volume TCP/IP Illustrated (Addison-Wesley) by W. Richard Stevens are indispensable for the serious socket programmer. If you want to learn the basics about sockets, it's hard to beat the original and classic reference, An Advanced 4.4BSD Interprocess Communication Tutorial. It's written for C, but almost everything is directly applicable to Perl. It's available in /usr/share/doc on most BSD-derived Unix systems. We also recommend you look at The Unix Programming Frequently Asked Questions List (Gierth and Horgan), and Programming UNIX Sockets in CFrequently Asked Questions (Metcalf and Gierth), both of which are posted periodically to the comp.unix.answers newsgroup.