You have all the background you need now; it's time to put MySQL to work!
This section will help you familiarize yourself with MySQL by providing a tutorial for you to try. As you work through the tutorial, you will create a sample database and some tables and then interact with the database by adding, retrieving, deleting, and modifying information in the tables. During the process of working with the sample database, you will learn the following things:
The basics of the SQL language that MySQL understands. (If you already know SQL from having used some other RDBMS, it would be a good idea to skim through this tutorial to see whether MySQL's version of SQL differs from the version with which you are familiar.)
How to communicate with a MySQL server using a few of the standard MySQL client programs. As noted in the previous section, MySQL operates using a client/server architecture in which the server runs on the machine containing the databases and clients connect to the server over a network. This tutorial is based largely on the mysql client program, which reads SQL queries from you, sends them to the server to be executed, and displays the results so you can see what happened. mysql runs on all platforms supported by MySQL and provides the most direct means of interacting with the server, so it's the logical client to begin with. Some of the examples also use mysqlimport and mysqlshow.
This book uses sampdb as the sample database name, but you may need to use a different name as you work through the material. For example, someone else on your system already may be using the name sampdb for their own database, or your MySQL administrator may assign you a different database name. In either case, substitute the actual name of your database for sampdb whenever you see the latter in examples.
Table names can be used exactly as shown in the examples, even if multiple users on your system have their own sample databases. In MySQL, it doesn't matter if other people use the same table names, as long as each of you uses your own database. MySQL will keep the tables straight and prevent you from interfering with each other.
This tutorial refers at certain points to files from the "sample database distribution" (also known as the sampdb distribution, after the name of the sampdb database). These files contain queries and data that will help you set up the sample database. See Appendix A, "Obtaining and Installing Software," for instructions on getting the distribution. When you unpack it, it will create a directory named sampdb containing the files you'll need. I recommend that you change location into that directory whenever you're working through examples pertaining to the sample database.
To try the examples in this tutorial, a few preliminary requirements must be satisfied:
You need to have the MySQL software installed.
You need a MySQL account so that you can connect to the server.
You need a database to work with.
The required software includes the MySQL clients and a MySQL server. The client programs must be located on the machine where you'll be working. The server can be located on your machine, although that is not required. As long as you have permission to connect to it, the server can be located anywhere. If you need to get MySQL, see Appendix A for instructions. If your network access comes through an Internet service provider (ISP), find out whether the provider offers MySQL as a service. If not and your ISP won't install it, check Appendix I, "Internet Service Providers," for some guidelines on choosing a more suitable provider.
In addition to the MySQL software, you'll need a MySQL account so that the server will allow you to connect and create your sample database and its tables. (If you already have a MySQL account, you can use that, but you may want to set up a separate account for use with the material in this book.)
At this point, we run into something of a chicken-and-egg problem. To set up a MySQL account to use for connecting to the server, it's necessary to connect to the server. Typically, this is done by connecting as the MySQL root user on the host where the server is running and issuing a GRANT statement to create a new MySQL account. If you've installed MySQL on your own machine and the server is running, you can connect to it and set up a new sample database administrator account with a username of sampadm and a password of secret as follows (change the name and password to those you want to use, both here and throughout the book):
% mysql -p -u root Enter password: ****** mysql> GRANT ALL ON sampdb.* TO 'sampadm'@'localhost' IDENTIFIED BY 'secret';
The mysql command includes a -p option to cause mysql to prompt for the root user's MySQL password. Enter the password where you see ****** in the example. (I assume that you have already set up a password for the MySQL root user and that you know what it is. If you haven't yet assigned a password, just press Enter at the Enter password: prompt. However, having no root password is insecure and you should assign one as soon as possible.)
The GRANT statement just shown is appropriate if you'll be connecting to MySQL from the same machine where the server is running. It allows you to connect to the server using the name sampadm and the password secret and gives you complete access to the sampdb database. However, GRANT doesn't create the database; we'll get to that a bit later.
If you don't plan to connect from the same host as the one where the server is running, change localhost to the name of the machine where you'll be working. For example, if you will connect to the server from the host asp.snake.net, the GRANT statement should look like this:
mysql> GRANT ALL ON sampdb.* TO 'sampadm'@'asp.snake.net' IDENTIFIED BY 'secret';
If you don't have control over the server, ask your MySQL administrator to set up an account for you. Then substitute the MySQL username, password, and database name that the administrator assigns you for sampadm, secret, and sampdb throughout the examples in this book.
More information on the GRANT statement, setting up MySQL user accounts, and changing passwords can be found in Chapter 11, "General MySQL Administration."
To connect to your server, invoke the mysql program from your shell (that is, from your UNIX prompt or from a DOS console under Windows). The command is as follows:
% mysql options
I use % throughout this book to indicate the shell prompt. That's one of the standard UNIX prompts; another is $. Under Windows, the prompt that you'll see will be something like C:\>.
The options part of the mysql command line might be empty, but more probably you'll have to issue a command that looks something like the following:
% mysql -h host_name -p -u user_name
You may not need to supply all those options when you invoke mysql, but it's likely that you'll have to specify at least a name and password. Here's what the options mean:
-h host_name (alternate form: --host=host_name)
The server host you want to connect to. If the MySQL server is running on the same machine where you are running mysql, this option normally can be omitted.
-u user_name (alternate form: --user=user_name)
Your MySQL username. If you're using UNIX and your MySQL username is the same as your login name, you can omit this option; mysql will use your login name as your MySQL name.
Under Windows, the default user name is ODBC, which is unlikely to be a useful default for you. Either specify a -u option on the command line or add a default to your environment by setting the USER variable. For example, you can use the following set command to specify a user name of sampadm:
C:\> set USER=sampadm
If you place this command in your AUTOEXEC.BAT file, it will take effect whenever you start up Windows and you won't have to issue it at the prompt.
-p (alternate form: --password)
This option tells mysql to prompt you for your MySQL password. For example:
% mysql -h host_name -p -u user_name Enter password:
When you see the Enter password: prompt, type in your password. (It won't be echoed to the screen, in case someone's looking over your shoulder.) Note that your MySQL password is not necessarily the same as your UNIX or Windows password. If you omit the -p option, mysql assumes you don't need one and doesn't prompt for it.
An alternate form of this option is to specify the password value directly on the command line by typing the option as -pyour_pass (alternate form: --password=your_pass). However, for security reasons, it's best not to do that. For one thing, the password becomes visible to others that way.
If you do decide to specify the password on the command line, note particularly that there is no space between the -p option and the following password value. This behavior of -p is a common point of confusion because it differs from the -h and -u options, which are associated with the word that follows them whether or not there is a space between the option and the word.
Suppose that my MySQL username and password are sampadm and secret. If the MySQL server is running on the same host, I can leave out the -h option and the mysql command to connect to the server looks like this:
% mysql -p -u sampadm Enter password: ******
After I enter the command, mysql prints Enter password: to prompt for my password, and I type it in (the ****** indicates where I type secret).
If all goes well, mysql prints a greeting and a mysql> prompt indicating that it is waiting for me to issue queries. The full startup sequence is as follows:
% mysql -p -u sampadm Enter password: ****** Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 7575 to server version: 4.0.4-log Type 'help;' or '\h' for help. Type '\c' to clear the buffer. mysql>
To connect to a server running on some other machine, it's necessary to specify the hostname using an -h option. If that host is cobra.snake.net, the command looks like this:
% mysql -h cobra.snake.net -p -u sampadm
In most of the examples that follow that show a mysql command line, I'm going to leave out the -h, -u, and -p options for brevity and assume that you'll supply whatever options are necessary.
After you establish a connection to the server, you can terminate your session any time by typing QUIT:
mysql> QUIT
Bye
You can also quit by typing \q or (on UNIX) by pressing Ctrl-D.
When you're just starting to learn MySQL, you'll probably consider its security system to be an annoyance because it makes it harder to do what you want. (You must obtain permission to create and access a database, and you must specify your name and password whenever you connect to the server.) However, after you move beyond the sample database used in this book to entering and using your own records, your perspective will change radically. Then you'll appreciate the way that MySQL keeps other people from snooping through (or worse, destroying) your information.
There are ways to set up your account so you don't have to type in connection parameters each time you run mysql. These are discussed in the "Tips for Interacting with mysql" section later in this chapter. The most common method for simplifying the connection process is to store your connection parameters in an option file. You may want to skip ahead to that section right now to see how to set up such a file.
After you're connected to the server, you're ready to issue queries. This section describes some general things you should know about interacting with mysql.
To enter a query in mysql, just type it in. At the end of the query, type a semicolon character (';') and press Enter. The semicolon tells mysql that the query is complete. After you've entered a query, mysql sends it to the server to be executed. The server processes the query and sends the results back to mysql, which displays the result for you.
The following example shows a simple query that asks for the current date and time:
mysql> SELECT NOW();
+---------------------+
| NOW() |
+---------------------+
| 2002-09-01 13:54:24 |
+---------------------+
1 row in set (0.00 sec)
mysql displays the query result and a line that shows the number of rows the result consists of and the time elapsed during query processing. In subsequent examples, I usually will not show the row-count line.
Because mysql waits for the semicolon as a statement terminator, you need not enter a query all on a single line. You can spread it over several lines if you want:
mysql> SELECT NOW(), -> USER(), -> VERSION() -> ; +---------------------+-------------------+----------------+ | NOW() | USER() | VERSION() | +---------------------+-------------------+----------------+ | 2002-09-01 13:54:37 | sampadm@localhost | 4.0.4-beta-log | +---------------------+-------------------+----------------+
Note how the prompt changes from mysql> to -> after you enter the first line of the query. That tells you that mysql thinks you're still entering the query, which is important feedback?if you forget the semicolon at the end of a query, the changed prompt helps you realize that mysql is still waiting for something. Otherwise, you'll be waiting, wondering why it's taking MySQL so long to execute your query, and mysql will be waiting patiently for you to finish entering your query! (mysql has a couple of other prompts as well; they're all discussed in Appendix E, "MySQL Program Reference.")
For the most part, it doesn't matter whether you enter queries using uppercase, lowercase, or mixed case. The following queries are all equivalent:
SELECT USER(); select user(); SeLeCt UsEr();
The examples in this book use uppercase for SQL keywords and function names, and lowercase for database, table, and column names.
When you invoke a function in a query, there must be no space between the function name and the following parenthesis:
mysql> SELECT NOW (); ERROR 1064: You have an error in your SQL syntax near '()' at line 1 mysql> SELECT NOW(); +---------------------+ | NOW() | +---------------------+ | 2002-09-01 13:56:36 | +---------------------+
These two queries look similar, but the first one fails because the parenthesis doesn't immediately follow the function name.
Another way to terminate a query is to use \g rather than a semicolon:
mysql> SELECT NOW()\g
+---------------------+
| NOW() |
+---------------------+
| 2002-09-01 13:56:47 |
+---------------------+
Or you can use \G, which displays the results in vertical format:
mysql> SELECT NOW(), USER(), VERSION()\G
*************************** 1. row ***************************
NOW(): 2002-09-01 13:56:58
USER(): sampadm@localhost
VERSION(): 4.0.4-beta-log
For a query that generates short output lines, \G is not so useful, but if the lines are so long that they wrap around on your screen, \G can make the output easier to read.
If you've begun typing in a multiple-line query and decide you don't want to execute it, type \c to clear (cancel) it:
mysql> SELECT NOW(), -> VERSION(), -> \c mysql>
Notice how the prompt changes back to mysql> to indicate that mysql is ready for a new query.
You can store queries in a file and tell mysql to read queries from the file rather than from the keyboard. Use your shell's input redirection facilities for this. For example, if I have queries stored in a file named myfile.sql, I can execute its contents as follows:
% mysql < myfile.sql
You can call the file whatever you want. I use the .sql suffix as a convention to indicate that a file contains SQL statements.
Executing mysql this way is something that will come up in the "Adding New Records" section later in this chapter when we enter data into the sampdb database. It's a lot more convenient to load a table by having mysql read INSERT statements from a file than to type in each statement manually.
The remainder of this tutorial shows many queries that you can try out for yourself. These are indicated by the mysql> prompt before the query, and such examples are usually accompanied by the output of the query. You should be able to type in these queries as shown, and the resulting output should be the same. Queries that are shown without a prompt are intended simply to illustrate a point, and you need not execute them. (You can try them out if you like; if you use mysql to do so, remember to include a terminator such as a semicolon at the end.)
When Do You Need a Semicolon?Most queries shown in this book end with a semicolon, which is a convenient way of indicating where each query ends (particularly for multiple-statement examples). It also parallels the way you'd enter the queries should you try them from the mysql program. But semicolons are not part of the SQL syntax for the statements, so when you issue a query in another context, such as from within a Perl or PHP script, you should omit the semicolon. If you do not, an error will most likely occur. |
We'll begin by creating the sampdb sample database and the tables within it, populating its tables, and performing some simple queries on the data contained in those tables. Using a database involves several steps:
Creating (initializing) the database
Creating the tables within the database
Interacting with the tables by inserting, retrieving, modifying, or deleting data
Retrieving existing data is easily the most common operation performed on a database. The next most common operations are inserting new data and updating or deleting existing data. Less frequent are table creation operations, and least frequent of all is database creation. However, we're beginning from scratch, so we must begin with database creation, the least common thing, and work our way through table creation and insertion of our initial data before we get to where we can do the really common thing?retrieving data.
To create a new database, connect to the server using mysql and then issue a CREATE DATABASE statement that specifies the database name:
mysql> CREATE DATABASE sampdb;
You'll need to create the sampdb database before you can create any of the tables that will go in it or do anything with the contents of those tables.
Does creating the database select it as the default (or current) database? No, it doesn't, as you can see by executing the following query:
mysql> SELECT DATABASE();
+------------+
| DATABASE() |
+------------+
| |
+------------+
To make sampdb the default database, issue a USE statement:
mysql> USE sampdb; mysql> SELECT DATABASE(); +------------+ | DATABASE() | +------------+ | sampdb | +------------+
The other way to select a database is to name it on the command line when you invoke mysql:
% mysql sampdb
That is, in fact, the usual way to name the database you want to use. If you need any connection parameters, specify them before the database name. For example, the following two commands allow the sampadm user to connect to the sampdb database on the local host and on cobra.snake.net:
% mysql -p -u sampadm sampdb % mysql -h cobra.snake.net -p -u sampadm sampdb
Unless specified otherwise, all the examples that follow assume that when you invoke mysql, you name the sampdb database on the command line to make it the current database. If you invoke mysql but forget to name the database on the command line, just issue a USE sampdb statement at the mysql> prompt.
In this section, we'll build the tables needed for the sampdb sample database. First, we'll consider the tables needed for the Historical League and then those for the grade-keeping project. This is the part where some database books start talking about Analysis and Design, Entity-Relationship Diagrams, Normalization Procedures, and other such stuff. There's a place for all that, but I prefer just to say we need to think a bit about what our database will look like?what tables it should contain, what the contents of each table should be, and some of the issues involved in deciding how to represent our data.
The choices made here about data representation are not absolute. In other situations, you might well elect to represent similar data in a different way, depending on the requirements of your applications and the uses to which you intend to put your data.
Table layout for the Historical League is pretty simple:
A president table. This contains a descriptive record for each U.S. president. We'll need this for the online quiz on the League Web site (the interactive analog to the printed quiz that appears in the children's section of the League's newsletter).
A member table. This is used to maintain current information about each member of the League. It'll be used for creating printed and online versions of the member directory, sending automated membership renewal reminders, and so on.
The president table is simpler, so let's discuss it first. This table will contain some basic biographical information about each United States president:
Name. Names can be represented in a table several ways. For example, we could have a single column containing the entire name or separate columns for the first and last name. It's certainly simpler to use a single column, but that limits you in some ways:
If you enter the names with the first name first, you can't sort on last name.
If you enter the names with the last name first, you can't display them with the first name first.
It's harder to search for names. For example, to search for a particular last name, you must use a pattern and look for names that match the pattern. This is less efficient and slower than looking for an exact last name.
To avoid these limitations, our president table will use separate columns for the first and last names.
The first name column will also hold the middle name or initial. This shouldn't break any sorting we might do because it's not likely we'll want to sort on middle name (or even first name). Name display should work properly, too, because the middle name immediately follows the first name regardless of whether a name is printed in "Bush, George W." or in "George W. Bush" format.
There is another slight complication. One president (Jimmy Carter) has a "Jr." at the end of his name. Where does that go? Depending on the format in which names are printed, this president's name is displayed as "James E. Carter, Jr.," or "Carter, James E., Jr." The "Jr." doesn't associate with either first or last name, so we'll create another column to hold a name suffix. This illustrates how even a single value can cause problems when you're trying to determine how to represent your data. It also shows why it's a good idea to know as much as possible about the type of data values you'll be working with before you put them in a database. If you have incomplete knowledge of what your data look like, you may have to change your table structure after you've already begun to use it. That's not necessarily a disaster, but in general it's something you want to avoid.
Birthplace (city and state). Like the name, this too can be represented using a single column or multiple columns. It's simpler to use a single column, but as with the name, separate columns allow you to do some things you can't do easily otherwise. For example, it's easier to find records for presidents born in a particular state if city and state are listed separately.
Birth date and death date. The only special problem here is that we can't require the death date to be filled in because some presidents are still living. MySQL provides a special value NULL that means "no value," so we can use that in the death date column to signify "still alive."
The member table for the Historical League membership list is similar to the president table in the sense that each record contains basic descriptive information for a single person. But each member record contains more columns:
Name. We'll use the same three-column representation as for the president table: last name, first name, and suffix.
ID number. This is a unique value assigned to each member when a membership first begins. The League hasn't ever used ID numbers before, but now that the records are being made more systematic, it's a good time to start. (I am anticipating that you'll find MySQL beneficial and that you'll think of other ways to apply it to the League's records. When that happens, it'll be easier to associate records in the member table with other member-related tables you may create if you use numbers rather than names.)
Expiration date. Members must renew their memberships periodically to avoid having them lapse. For some applications, you might use the date of the most recent renewal, but this is not suitable for the League's purposes. Memberships can be renewed for a variable number of years (typically one, two, three, or five years), and a date for the most recent renewal wouldn't tell you when the next renewal must take place. In addition, the League allows lifetime memberships. We could represent these with a date far in the future, but NULL seems more appropriate because "no value" logically corresponds to "never expires."
Email address. Publishing these addresses will make it easier for those members that have them to communicate with each other more easily. For your purposes as League secretary, these addresses will allow you to send out membership renewal notices electronically rather than by postal mail. This should be easier than going to the post office and less expensive as well. You'll also be able to use email to send members the current contents of their directory entries and ask them to update the information as necessary.
Postal address. This is needed for contacting members that don't have email (or who don't respond to it). We'll use columns for street address, city, state, and Zip code.
I'm assuming that all League members live in the United States. For organizations with a membership that is international in scope, that assumption is an oversimplification, of course. If you want to deal with addresses from multiple countries, you'll run into some sticky issues having to do with the different address formats used for different countries. For example, Zip code is not an international standard, and some countries have provinces rather than states.
Phone number. Like the address fields, this is useful for contacting members.
Special interest keywords. Every member is assumed to have a general interest in U.S. history, but members probably also have some special areas of interest. This column records those interests. Members can use it to find other members with similar interests.
Now we're ready to create the Historical League tables. For this we use the CREATE TABLE statement, which has the following general form:
CREATE TABLE tbl_name ( column_specs );
tbl_name indicates the name you want to give the table. column_specs provides the specifications for the columns in the table, as well as any indexes (if there are any). Indexes make lookups faster; we'll discuss them further in Chapter 4, "Query Optimization." For the president table, the CREATE TABLE statement looks like this:
CREATE TABLE president ( last_name VARCHAR(15) NOT NULL, first_name VARCHAR(15) NOT NULL, suffix VARCHAR(5) NULL, city VARCHAR(20) NOT NULL, state VARCHAR(2) NOT NULL, birth DATE NOT NULL, death DATE NULL );
If you want to type in that statement yourself, invoke mysql, making sampdb the current database:
% mysql sampdb
Then enter the CREATE TABLE statement as just shown, including the trailing semicolon so that mysql can tell where the end of the statement is.
To create the president table using a prewritten description, use the create_president.sql file from the sampdb distribution. This file is located in the sampdb directory that is created when you unpack the distribution. Change location into that directory and then run the following command:
% mysql sampdb < create_president.sql
Whichever way you invoke mysql, specify any connection parameters you may need (hostname, username, or password) on the command line preceding the database name.
Each column specification in the CREATE TABLE statement consists of the column name, the data type (the kind of values the column will hold), and possibly some column attributes.
The two column types used in the president table are VARCHAR and DATE.VARCHAR(n) means the column contains variable-length character (string) values, with a maximum length of n characters each. You choose the value of n according to how long you expect your values to be. state is declared as VARCHAR(2); that's all we need if states are entered using their two-character abbreviations. The other string-valued columns need to be wider to accommodate longer values.
The other column type we've used is DATE. This type indicates, not surprisingly, that the column holds date values. However, what may be surprising to you is the format in which dates are represented. MySQL expects dates to be specified in 'CCYY-MM-DD' format, where CC, YY, MM, and DD represent the century, year within the century, month, and date. This is the ANSI SQL standard for date representation (also known as ISO 8601 format). For example, a date of July 18, 2002 is specified in MySQL as '2002-07-18', not as '07-18-2002' or '18-07-2002'.
The only attributes we're using for the columns in the president table are NULL (values can be missing) and NOT NULL (values must be filled in). Most columns are NOT NULL because we'll always have a value for them. The two columns that can have NULL values are suffix (most names don't have one) and death (some presidents are still alive, so there is no date of death).
For the member table, the CREATE TABLE statement looks like this:
CREATE TABLE member ( member_id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (member_id), last_name VARCHAR(20) NOT NULL, first_name VARCHAR(20) NOT NULL, suffix VARCHAR(5) NULL, expiration DATE NULL DEFAULT '0000-00-00', email VARCHAR(100) NULL, street VARCHAR(50) NULL, city VARCHAR(50) NULL, state VARCHAR(2) NULL, zip VARCHAR(10) NULL, phone VARCHAR(20) NULL, interests VARCHAR(255) NULL );
Type that statement into mysql or execute the following command to use the prewritten file from the sampdb distribution:
% mysql sampdb < create_member.sql
In terms of column types, most columns of the member table except two are not very interesting because they are created as variable-length strings. The exceptions are member_id and expiration, which exist to hold sequence numbers and dates, respectively.
The primary consideration for the member_id membership number column is that each of its values should be unique to avoid confusion between members. An AUTO_INCREMENT column is useful here because then we can let MySQL generate unique numbers for us automatically when we add new members. Even though it just contains numbers, the declaration for member_id has several parts:
INT signifies that the column holds integers (numeric values with no fractional part).
UNSIGNED disallows negative numbers.
NOT NULL requires that the column value must be filled in. (This means that no member can be without an ID number.)
AUTO_INCREMENT is a special attribute in MySQL. It indicates that the column holds sequence numbers. The AUTO_INCREMENT mechanism works like this; If the value for the member_id column is missing (or NULL) when you create a new member table record, MySQL automatically generates the next sequence number and assigns it to the column. This makes it easy to assign IDs to new members, because MySQL will do it for us.
The PRIMARY KEY clause indicates that the member_id column is indexed to allow fast lookups and that each value in the column must be unique. The latter property is desirable for member ID values, because it prevents us from using the same ID twice by mistake. (Besides, MySQL requires every AUTO_INCREMENT column to have some kind of unique index, so the table definition is illegal without one.)
If you don't understand that stuff about AUTO_INCREMENT and PRIMARY KEY, just think of them as giving us a magic way of generating an ID number for each member. It doesn't particularly matter what the values are, as long as they're unique. (When you're ready to learn more about how to declare and use AUTO_INCREMENT columns, Chapter 2, "Working with Data in MySQL," covers them in detail.)
The expiration column is a DATE. It has a default value of '0000-00-00', which is a non-NULL value that means no legal date has been entered. The reason for this is that expiration can be NULL to indicate that a member has a lifetime membership. If we don't specify otherwise, a column that can contain NULL also has NULL as its default value. That's not desirable in this case; if you created a new member record but forgot to specify the expiration date, MySQL would fill in the expiration column with NULL automatically?thus making the member a lifetime member! By specifying that the column has a default value of '0000-00-00' instead, we avoid this problem. That also gives us a value we can search for periodically to find records for which the expiration date was never properly entered.
Now that you've told MySQL to create a couple of tables, check to make sure that it did so as you expect. In mysql, issue the following query to see the structure of the president table:
mysql> DESCRIBE president;
+------------+-------------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+------------+-------+
| last_name | varchar(15) | | | | |
| first_name | varchar(15) | | | | |
| suffix | varchar(5) | YES | | NULL | |
| city | varchar(20) | | | | |
| state | char(2) | | | | |
| birth | date | | | 0000-00-00 | |
| death | date | YES | | NULL | |
+------------+-------------+------+-----+------------+-------+
In some versions of MySQL, the results from DESCRIBE include additional information showing access privilege information. I've not shown that here because it makes the lines too long to display without wrapping around.
The output looks pretty much as we'd expect, except that the information for the state column says its type is CHAR(2). That's odd; wasn't it declared as VARCHAR(2)? Yes, it was, but MySQL has silently changed the type from VARCHAR to CHAR. The reason for this has to do with efficiency of storage space for short character columns, which I won't go into here. If you want the details, check the discussion of the ALTER TABLE statement in Chapter 3, "MySQL SQL Syntax and Use." For our purposes here, there is no difference between the two types. The important thing is that the column stores two-character values.
If you issue a DESCRIBE member query, mysql will show you similar information for the member table.
DESCRIBE is useful when you forget the name of a column in a table or need to know its type or how wide it is and so on. It's also useful for finding out the order in which MySQL stores columns in table rows. That order is important when you use INSERT or LOAD DATA statements that expect column values to be listed in the default column order.
The information produced by DESCRIBE can be obtained in different ways. It may be abbreviated as DESC or written as an EXPLAIN or SHOW statement. The following statements are all synonymous:
DESCRIBE president; DESC president; EXPLAIN president; SHOW COLUMNS FROM president; SHOW FIELDS FROM president;
These statements also allow you to restrict the output to particular columns. For example, you can add a LIKE clause at the end of a SHOW statement to display information only for column names that match a given pattern:
mysql> SHOW COLUMNS FROM president LIKE '%name';
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| last_name | varchar(15) | | | | |
| first_name | varchar(15) | | | | |
+------------+-------------+------+-----+---------+-------+
The '%' character used here is a special wildcard character that is described later in the "Pattern Matching" section. Similar restrictions can be used with DESCRIBE and EXPLAIN as well; for the exact syntax, see Appendix D, "SQL Syntax Reference."
The SHOW statement has other forms that are useful for obtaining different types of information from MySQL. SHOW TABLES lists the tables in the current database, so with the two tables we've created so far in the sampdb database, the output looks like this:
mysql> SHOW TABLES;
+------------------+
| Tables_in_sampdb |
+------------------+
| member |
| president |
+------------------+
SHOW DATABASES lists the databases that are managed by the server to which you're connected:
mysql> SHOW DATABASES;
+-----------+
| Database |
+-----------+
| menagerie |
| mysql |
| sampdb |
| test |
+-----------+
The list of databases varies from server to server, but you should see at least sampdb and mysql. You created sampdb yourself, and the database named mysql holds the grant tables that control MySQL access privileges.
The mysqlshow utility provides a command-line interface to the same kinds of information that the SHOW statement displays. With no arguments, mysqlshow displays a list of databases:
% mysqlshow
+-------------+
| Databases |
+-------------+
| menagerie |
| mysql |
| sampdb |
| test |
+-------------+
With a database name, it shows the tables in the given database:
% mysqlshow sampdb
Database: sampdb
+-----------+
| Tables |
+-----------+
| member |
| president |
+-----------+
With a database and table name, mysqlshow displays information about the columns in the table, much like the SHOW COLUMNS statement.
To see what tables are required for the grade-keeping project, let's consider how you might write down scores when you use a paper-based gradebook. Figure 1.2 shows a page from your gradebook. The main body of this page is a matrix for recording scores. There is also other information necessary for making sense of the scores. Student names and ID numbers are listed down the side of the matrix. (For simplicity, only four students are shown.) Along the top of the matrix, you put down the dates when you give quizzes and tests. The figure shows that you've given quizzes on September 3, 6, 16, and 23, and tests on September 9 and October 1.
To keep track of this kind of information using a database, we need a score table. What should records in this table contain? That's easy. For each row, we need student name, the date of the quiz or test, and the score. Figure 1.3 shows how some of the scores from the gradebook look when represented in a table like this. (Dates are written the way MySQL represents them, in 'CCYY-MM-DD' format.)
However, there is a problem with setting up the table in this way because it leaves out some information. For example, looking at the records in Figure 1.3, we can't tell whether scores are for a quiz or a test. It could be important to know score types when determining final grades if quizzes and tests are weighted differently. We might try to infer the type from the range of scores on a given date (quizzes usually are worth fewer points than a test), but that's ugly because it relies on inference and not something explicit in the data.
It's possible to distinguish scores by recording the type in each record, for example, by adding a column to the score table that contains 'T' or 'Q' for each row to indicate "test" or "quiz," as in Figure 1.4. This has the advantage of making the type of score explicit in the data. The disadvantage is that this information is somewhat redundant. Observe that for all records with a given date, the score type column always has the same value. The scores for September 23 all have a type of 'Q', and the scores for October 1 all have a type of 'T'. This is unappealing. If we record a set of scores for a quiz or test this way, not only will we be putting in the same date for each new record in the set, we'll be putting in the same score type over and over again. Ugh. Who wants to enter all that redundant information?
Let's try an alternative representation. Instead of recording score types in the score table, we'll figure them out from the dates. We can keep a list of dates and use it to keep track of what kind of "grade event" (quiz or test) occurred on each date. Then we can determine whether any given score was from a quiz or a test by combining it with the information in our event list; just match the date in the score table record with the date in the event table to get the event type. Figure 1.5 shows this table layout and demonstrates how the association works for a score table record with a date of September 23. By matching the record with the corresponding record in the event table, we see that the score is from a quiz.
This is much better than trying to infer the score type based on some guess; instead, we're deriving the type directly from data recorded explicitly in our database. It's also preferable to recording score types in the score table because we must record each type only one time, rather than once per score record.
However, now we're combining information from multiple tables. If you're like me, when you first hear about this kind of thing, you think, "Yeah, that's a cute idea, but isn't it a lot of work to do all that looking up all the time; doesn't it just make things more complicated?"
In a way, that's correct; it is more work. Keeping two lists of records is more complicated than keeping one list. But take another look at your gradebook (see Figure 1.2). Aren't you already keeping two sets of records? Consider the following facts:
You keep track of scores using the cells in the score matrix, where each cell is indexed by student name and date (down the side and along the top of the matrix). This represents one set of records; it's analogous to the contents of the score table.
How do you know what kind of event each date represents? You've written a little 'T' or 'Q' above the date, so you're also keeping track of the association between date and score type along the top of the matrix. This represents a second set of records; it's analogous to the event table contents.
In other words, even though you may not think about it as such, you're really not doing anything different with the gradebook than what I'm proposing to do by keeping information in two tables. The only real difference is that the two kinds of information aren't so explicitly separated in the paper-based gradebook.
The page in the gradebook illustrates something about the way we think of information and about the difficulty of figuring out how to put information in a database. We tend to integrate different kinds of information and interpret them as a whole. Databases don't work like that, which is one reason why they sometimes seem artificial and unnatural. Our natural tendency to unify information makes it quite difficult sometimes even to realize when we have multiple types of data instead of just one. Because of this, you may find it a challenge to "think as a database thinks" about how your data should be represented.
One requirement imposed on the event table by the layout shown in Figure 1.5 is that the dates be unique because each date is used to link together records from the score and event tables. In other words, you cannot give two quizzes on the same day, or a quiz and a test. If you do, you'll have two sets of records in the score table and two records in the event table, all with the same date, and you won't be able to tell how to match score records with event records.
That problem will never come up if there is never more than one grade event per day. But is it really valid to assume that will never happen? It might seem so; after all, you don't consider yourself sadistic enough to give a quiz and a test on the same day. But I hope you'll pardon me if I'm skeptical. I've often heard people claim about their data, "That odd case will never occur." Then it turns out the odd case does occur on occasion, and usually you have to redesign your tables to fix problems that the odd case causes.
It's better to think about the possible problems in advance and anticipate how to handle them. So, let's suppose you might need to record two sets of scores for the same day sometimes. How can we handle that? As it turns out, this problem isn't so difficult to solve. With a minor change to the way we lay out our data, multiple events on a given date won't cause trouble:
Add a column to the event table and use it to assign a unique number to each record in the table. In effect, this gives each event its own ID number, so we'll call this the event_id column. (If this seems like an odd thing to do, consider that your gradebook in Figure 1.2 already has this property; the event ID is just like the column number in your gradebook score matrix. The number might not be written down explicitly there and labeled "event ID," but that's what it is.)
When you put scores in the score table, record the event ID rather than the date.
The result of these changes is shown in Figure 1.6. Now you link together the score and event tables using the event ID rather than the date, and you use the event table to determine not just the type of each score but also the date on which it occurred. Also, it's no longer the date that must be unique in the event table, it's the event ID. This means you can have a dozen tests and quizzes on the same day, and you'll be able to keep them straight in your records. (No doubt your students will be thrilled to hear this.)
Unfortunately, from a human standpoint, the table layout in Figure 1.6 seems less satisfactory than the previous ones. The score table is more abstract because it contains fewer columns that have a readily apparent meaning. The table layout shown earlier in Figure 1.4 was easy to look at it and understand because the score table had columns for both dates and score types. The current score table shown in Figure 1.6 has columns for neither. This seems highly removed from anything we can think about easily. Who wants to look at a score table that has "event IDs" in it? That just doesn't mean much to us.
At this point, you reached a crossroads. You're intrigued by the possibility of being able to perform grade-keeping electronically and not having to do all kinds of tedious manual calculations when assigning grades. But after considering how you actually would represent score information in a database, you're put off by how abstract and disconnected the representation seems to make that information.
This leads naturally to a question: "Would it be better not to use a database at all? Maybe MySQL isn't for me." As you might guess, I will answer that question in the negative, because otherwise this book will come to a quick end. But when you're thinking about how to do a job, it's not a bad idea to consider various alternatives and to ask whether you're better off using a database system such as MySQL or something else, such as a spreadsheet program:
The gradebook has rows and columns, and so does a spreadsheet. This makes the gradebook and a spreadsheet conceptually and visually very similar.
A spreadsheet program can perform calculations, so you could total up each student's scores using a calculation field. It might be a little tricky to weight quizzes and tests differently, but you could do it.
On the other hand, if you want to look at just part of your data (quizzes only or tests only, for example), perform comparisons such as boys versus girls, or display summary information in a flexible way, it's a different story. A spreadsheet doesn't work so well, whereas relational database systems perform those operations easily.
Another point to consider is that the abstract and disconnected nature of your data as represented in a relational database is not really that big of a deal, anyway. It's necessary to think about that representation when setting up the database so that you don't lay out your data in a way that doesn't make sense for what you want to do with it. However, after you determine the representation, you're going to rely on the database engine to pull together and present your data in a way that is meaningful to you. You're not going to look at it as a bunch of disconnected pieces.
For example, when you retrieve scores from the score table, you don't want to see event IDs; you want to see dates. That's not a problem. The database will look up dates from the event table based on the event ID and show them to you. You may also want to see whether the scores are for tests or quizzes. That's not a problem, either. The database will look up score types the same way?using the event ID. Remember, that's what a relational database system like MySQL is good at?relating one thing to another to pull out information from multiple sources to present you with what you really want to see. In the case of our grade-keeping data, MySQL does the thinking about pulling information together using event IDs so that you don't have to.
Now, just to provide a little advance preview of how you'd tell MySQL to do this relating of one thing to another, suppose you want to see the scores for September 23, 2002. The query to pull out scores for an event given on a particular date looks like the following:
SELECT score.name, event.date, score.score, event.type FROM score, event WHERE event.date = '2002-09-23' AND score.event_id = event.event_id;
Pretty scary, huh? This query retrieves the student name, the date, score, and the type of score by joining (relating) score table records to event table records. The result looks like this:
+--------+------------+-------+------+ | name | date | score | type | +--------+----------- +-------+------+ | Billy | 2002-09-23 | 15 | Q | | Missy | 2002-09-23 | 14 | Q | | Johnny | 2002-09-23 | 17 | Q | | Jenny | 2002-09-23 | 19 | Q | +--------+------------+-------+------+
Notice anything familiar about the format of that information? You should; it's the same as the table layout shown in Figure 1.4. And you don't need to know the event ID to get this result. You specify the date you're interested in and let MySQL figure out which score records go with that date. So if you've been wondering whether all the abstraction and disconnectedness loses us anything when it comes to getting information out of the database in a form that's meaningful to us, it doesn't.
Of course, after looking at that query, you might be wondering something else, too. Namely, it looks kind of long and complicated; isn't writing something like that a lot of work to go to just to find the scores for a given date? Yes, it is. However, there are ways to avoid typing several lines of SQL each time you want to issue a query. Generally, you figure out once how to perform a query such as that one and then you store it so that you can repeat it easily as necessary. We'll see how to do this in the "Tips for Interacting with mysql" section later in this chapter.
I've actually jumped the gun a little bit in showing you that query. It is, believe it or not, a little simpler than the one we're really going to use to pull out scores. The reason for this is that we need to make one more change to our table layout. Instead of recording student name in the score table, we'll use a unique student ID. (That is, we'll use the value from the "ID" column of your gradebook rather than from the "Name" colu