Creating a Database

Databases are described by a very specific organization scheme. To illustrate database concepts, I will create and view a simple phone list. The basic structure of the phone list is shown in Table 7.1.

Table 7.1: PHONE LIST SUMMARY
id	firstName	lastName	e-mail	phone
0	Andy	Harris	<aharris@cs.iupui.edu>	123-4567
1	Joe	Slow	<jslow@myPlace.net>	987-6543

The phone list shows a very typical simple data table. Database people like to give special names to the parts of the database, so I'll use this simple phone list to illustrate. Each row of the table is called a record. Records describe discrete entities. The list of records is called a table. Each record in a table has the same elements, which are called fields, (or sometimes simply columns). Every record in the table has the same field definitions, but records can have different values in the fields. The fields in a table are defined in specific ways. Because of the way database tables are stored in files, the computer must always know how much room to allocate for each field, so the size and type of each field is important. This particular database is defined with five fields. The id field is an integer. All the other fields contain string data.

Creating a Table

RDBMS programs use a special language called Structured Query Language(SQL) to create and manipulate databases. SQL is usually pretty easy to understand, compared to full-blown programming languages. You can usually guess what's going on even without a lot of knowledge. As an example, look at the following SQL code:

USE chapter7;

CREATE TABLE phoneList (
  id INT PRIMARY KEY,
  firstName VARCHAR(15),
  lastName VARCHAR (15),
  email VARCHAR(20),
  phone VARCHAR(15)
);

DESCRIBE phonelist;

This code is an SQL script. It is like a PHP program in the sense it is a set of instructions for the computer to follow. However, the PHP interpreter doesn't directly interact with the SQL language. Instead, these commands are sent to another program. As a PHP programmer, you'll also be able to write code that sends commands to a database language. Just as your PHP code often writes code in HTML format for the browser to interpret, you'll be writing SQL code for the MySQL interpreter to use.

Databases have been an important part of programming since the beginning, but the process of working with data has gone through several evolutions. The advent of a common language that can be used in many applications was a very important step. SQL is a fourth-generationlanguage. In general, these languages are designed to solve a particular type of problem. Some fourth generation languages (like SQL) aren't full-blown programming languages, because they don't support data structures like branches and loops. Still, these languages can serve a very useful purpose. SQL is handy because it's widely supported. The SQL commands you will learn in this chapter will apply to most modern database programs with little to no modification. You can take the script you'll learn shortly in MySQL and send the same code to an Oracle or MS SQL Server database (two other very common choices), and all three data programs will build the same database. If you choose to upgrade to a more powerful data package, you can use your existing scripts to manipulate the data. Perhaps the most powerful reason to have a scripting language to control databases relates to programming in traditional languages. You can write a program in any language (like PHP, for example) that generates SQL code. You can then use that code to manipulate the database. This allows you to have complete flexibility, and lets your program act as the interface to your database.

When this code is sent to an SQL-compliant database program (such as MySQL) it will create the database structure shown in Table 7.1.

Using a Database

It is possible that you will have several database projects working in the same relational database system. In my case, each chapter of this book that uses SQL has its own database. Sometimes your system administrator will assign you a database. In any case, you will probably need to invoke that database with the USE command.

TRICK?

The syntax of SQL is not exactly like that of PHP. SQL has a different culture, and it makes sense to respect the way SQL code has historically been written. SQL is generally not case-sensitive, but most SQL coders put all SQL commands in all uppercase. Also, when a bunch of SQL commands are placed in a file as this code will be, you usually end each line with a semicolon.

TRICK?

If you don't already have a database to USE, you can create one with the CREATE command. For example, to create a database called "myStuff," use these commands:

CREATE DATABASE mystuff;
USE mystuff;

Creating a Table

To create a table, you must indicate the name of the table as well as each field in the table. For each field, you must list what type of data is held in the field, and (at least for text data) how many characters long the field will be. As an example, the following code creates the phoneList table:

CREATE TABLE phoneList (
  id INT PRIMARY KEY,
  firstName VARCHAR(15),
  lastName VARCHAR (15),
  email VARCHAR(20),
  phone VARCHAR(15)
);

You can think of fields as being much like variables, but while PHP is easy-going about what type of data is in a variable, SQL is very picky about the type of data in fields. In order to create an efficient database, MySQL needs to know exactly how many bytes of memory to set aside for every single field in the database. The primary way it does this is to require the database designer to specify the type and size of every field in each table. Table 7.2 lists a few of the primary data types supported by MySQL.

Table 7.2: COMMON DATA TYPES IN MYSQL
Data type	Description
INT	Standard integer 2 billion (roughly)
BIGINT	Big integer 9 x 10 ^18^th
FLOAT	Floating point decimal number 38 digits
DOUBLE	Double precision floating point 308 digits
CHAR(n)	text with n digits. If actual value is less than n, field will be padded with trailing spaces
VARCHAR(n)	Text with n digits. Trailing spaces will automatically be culled
DATE	Date in YYYY-MM-DD format
TIME	Time in HH:MM:SS format
YEAR	Year in YYYY format

TRICK?

While the data types listed in Table 7.2 are by far the most commonly used, MySQL supports many other data types as well. Look in the online help that ships with MySQL if you need a more specific data type. Other databases will have a very similar list of data types.

You might notice that it is not necessary to specify the length of numeric types (although you can determine a maximum size for numeric types as well as the number of digits you want stored in float and double fields). The storage requirements for numeric variables are based on the type itself.

Working with String Data in MySQL

Text values are usually stored in VARCHAR fields. These fields must include the number of characters allocated for the field. Both CHAR and VARCHAR fields have fixed lengths. The primary difference between them is what happens when the field contains a value shorter than the specified length. If you declare a CHAR field to have a length of 10 with

firstName VARCHAR(10);

and then later store the value 'Andy' into the field, the field will actually contain 'Andy ' (that is, Andy followed by six spaces). CHAR fields pad any remaining characters with spaces. The VARCHAR field type removes any padded spaces. Usually you will use the VARCHAR field type to store all your string data.

Data design is both a science and an art. Determining the appropriate length for your text fields is one of the oldest problems in data.

If you don't allocate enough room for your text data, you can cause a lot of problems for your users. I once taught a course called CLT SD WEB PRG because the database that held the course names didn't have enough room for the actual name of the course (Client-Side Web Programming). My students renamed it the "Buy a Vowel" course.

However, you can't simply make every text field a thousand characters long, either, because this would be wasteful of system resources. If you have a field that will usually contain only five characters and you allocate one hundred characters, you will still require room on the drive for the extra 95 characters. If your database has thousands of entries, this can be a substantial cost in drive space. In a distributed environment, you'll also have to wait for those unnecessary spaces to come across limited bandwidth. It takes experimentation and practice to determine the appropriate width for your string fields. You'll need to test your application with real users before you can really be sure if you've made the right decision.

Finishing up the CREATE TABLE Statement

Once you understand field data types, the CREATE TABLE syntax makes a lot of sense. There are only a few more details to understand. Once you specify CREATE TABLE, use a pair of parentheses to indicate the field list. Each field has a name followed by its type (and length, if it's a CHAR or VARCHAR). The fields are separated by commas. You do not have to put each field on its own line or indent the field definitions, but I prefer to do so, because these practices make the code much easier to read and debug.

Creating a Primary Key

You might be curious about the very first field in the phone list database. Just to refresh your memory, the line that defines that field looks like this:

id INT PRIMARY KEY,

Most database tables have some sort of field like this that holds a numeric value. This special field is called the primary key.

IN THE REAL WORLD

A very simple database like the phone list could theoretically go without a primary key, but such fields are so important to more sophisticated databases that you might as well start putting them into even your first table. It's traditional to put a primary key in every table. In Chapter 9 "Data Normalization," you'll learn more about the relational data model. In that discussion you'll learn how keys are used to build powerful databases, and you'll learn more about creating proper primary keys. In fact, the adventure program you've already seen heavily relies on a key field even though there's only one table in the database.

The code presented so far can be entered directly into the MySQL program. You can see the code and its results in Figure 7.6.

Figure 7.6: The MySQL command line tool after I created the phonelist table.

Using the DESCRIBE Command to Check the Structure of a Table

It can be useful to check the structure of a table, especially if somebody else created it or you don't remember exactly what types or sizes of fields are in the table. The DESCRIBE command lets you view the structure of a table.

Inserting Values

Once you've created a table, you can begin to add data to it. The primary tool for adding records to a table is the INSERT command.

INSERT INTO phoneList
VALUES (
  0, 'Andy', 'Harris', 'aharris@cs.iupui.edu', '123-4567'
);

The INSERT statement allows you to add a record into a database. The values must be listed in exactly the same order the fields were defined. Each value is separated by a comma, and all VARCHAR and CHAR values must be enclosed in single quotes.

If you have a large amount of data to load into a database, you can also use the LOAD DATA command. This command accepts a tab-delimited text file with one row per record and fields separated by tabs. It then loads that entire file into the database. This is often the fastest way to load a database with test data. The following line loads data from a file called "addresses.txt" into the phoneList table:

LOAD DATA LOCAL INFILE "addresses.txt" INTO TABLE phonelist;

Figure 7.7 shows the MySQL tool after I have added one record to the table.

Figure 7.7: MySQL tells you the operation succeeded, but you don't get a lot more information.

IN THE REAL WORLD

As you are building a database, you will need to populate the database with test values. You don't want to use actual data at this point, because your database will not work correctly until you've messed with it for some time. However, your test values should be reflective of the kinds of data your database will actually house. This will help you spot certain problems like field lengths that are too small or fields that are missing.

Selecting Results

Of course, you'll want to see the results of all your table-building activities. If you want to see the data in a table, you can use the SELECT command. This is perhaps the most powerful command in SQL, but its basic use is quite simple. To see all of the data in the phonelist database, use this command: