15.3 Manipulating Data and Databases

In this section, we show you how to alter a database's structure after you've created it. We also expand on the topics of creating, inserting, deleting, and updating data, including how to work with external files and multiple tables, and optimizing queries.

15.3.1 Altering Databases

Altering a table is unusual: most of the time, you'll define the structure of a table before you create it and you won't change it during its lifetime. However, indexes, attributes, modifiers, and other features of a table can be changed after creation, and this is sometimes a useful feature when you want to add a new index that supports a new query, modify an attribute type or length when needed, or tune your database.

Adding indexes is a popular use of the ALTER TABLE statement. For example, to add an index to the customer table, you can run:

ALTER TABLE customer ADD INDEX cities (city);

The label cities is the name of the new index and the attribute that's indexed is city.

To remove the same index from the customer table, use:

ALTER TABLE customer DROP INDEX cities;

This removes the index, not the attribute.

The DROP statement discussed in Chapter 5 can also be used to remove an index. For example:

DROP INDEX cities ON customer;

Behind the scenes, MySQL converts this to an ALTER TABLE statement.

The ALTER TABLE statement can also be used to add, remove, and alter all other aspects of the table, such as attributes and the primary index. For example, to add a new fax attribute to the customer table, you can use:

ALTER TABLE customer ADD fax varchar(15);

To remove the attribute fax, use:

ALTER TABLE customer DROP fax;

To change the cust_id attribute from type int to smallint, you can use:

ALTER TABLE customer MODIFY cust_id smallint;

You can use a similar syntax to rename an attribute cust_id to id:

ALTER TABLE customer CHANGE cust_id id smallint;

You can also change attribute lengths using a similar syntax:

ALTER TABLE customer MODIFY surname char(10);

You can even rename the customer table to clients:

ALTER TABLE customer RENAME clients;

This isn't an exhaustive list of things you can do with ALTER TABLE: for example, as we show in "Table Types," you can use it to alter the table type after creation. The complete syntax of the examples we've shown and many more examples can be found in Section 6.5.4 of the MySQL manual.

Be careful when altering your tables. For example, if you rename attributes then your associative access to those attributes in PHP will need modification. If you reduce the maximum length of an attribute, then values that exceed the new length will be truncated to fit; for numbers, this means that if the old value exceeds the new maximum value then the new maximum value is stored, while for other types it means that they are right truncated.

15.3.2 More on Inserting Data

In this section, we show you how to insert data from one or more tables into another table, create a new table using a query, replace existing data with new data, bulk load data from a text file into a database, and cache insertions in a buffer so that they can be optimized. Using INSERT with SELECT

In the previous chapter, we showed you how to insert data using three different techniques. In this section, we show you how insertion and querying can be closely tied together using a nested querying approach with the INSERT INTO ... SELECT statement. This is useful for copying data and, if needed, modifying the data as it is copied.

Consider an example where you want to create a permanent record of the total sales to each customer. First of all, let's create a simple table to store the customer and sales details:

CREATE TABLE salesuntilnow


  cust_id int(5) NOT NULL,

  surname varchar(50),

  firstname varchar(50),

  totalsales decimal(5,2),

  PRIMARY KEY (cust_id)

) type=MyISAM;

Now, you can issue a nested INSERT INTO ... SELECT statement to populate the new table with the customer details and the total sales:

INSERT INTO salesuntilnow (cust_id, surname, firstname, totalsales)

  SELECT customer.cust_id, surname, firstname, SUM(price) 

    FROM customer INNER JOIN items USING (cust_id) 

    GROUP BY items.cust_id;

The four attributes listed in the SELECT statement are mapped to the four attributes listed in the INSERT INTO statement. For example, the customer.cust_id in the SELECT statement is mapped into cust_id in the salesuntilnow table. Note that unlike other nested queries, the SELECT statement isn't surrounded by brackets (and MySQL will complain if you try to include them). Note also that the VALUES keyword isn't used with the INSERT statement.

Here's a query on the new table:

SELECT * from salesuntilnow;

It output the following results in part:


| cust_id | surname   | firstname | totalsales |


|       1 | Rosenthal | Joshua    |     925.80 |

|       2 | Serrong   | Martin    |    1535.07 |

|       3 | Leramonth | Jacob     |     896.27 |

|       4 | Keisling  | Perry     |     979.17 |

|       5 | Mockridge | Joel      |     240.70 |

|       6 | Ritterman | Richard   |     448.72 |

|       7 | Morfooney | Sandra    |     972.74 |

|       8 | Krennan   | Betty     |      69.98 |

There are two sensible limitations when inserting with a SELECT statement: first, the query can't contain an ORDER BY, and second, the FROM clause can't contain the target table of the INSERT INTO. Using CREATE TABLE with SELECT

You can create a table and insert data from one or more other tables in a single step. For example, you can create the salesuntilnow table we created in the previous section and insert the sales data in one query. Here's how it's done:

CREATE TABLE salesuntilnow

  SELECT customer.cust_id, surname, firstname, SUM(price) 

  FROM customer INNER JOIN items USING (cust_id) GROUP BY items.cust_id;

The result is exactly the same as in the previous section, except that you don't have explicit control over the definition of the attribute names and types, and the indexes. Instead, the attribute names are copied from the SELECT statement, and the types are chosen by MySQL (though they are usually the same as the source attributes). No indexes are created; you need to add indexes afterwards using ALTER TABLE. In this example, the table has the following structure (as shown by running SHOW COLUMNS FROM salesuntilnow):


| Field      | Type         | Collation         | Null | Key | Default | Extra |


| cust_id    | int(5)       | binary            |      |     | 0       |       |

| surname    | char(50)     | latin1_swedish_ci | YES  |     | NULL    |       |

| firstname  | char(50)     | latin1_swedish_ci | YES  |     | NULL    |       |

| SUM(price) | double(19,2) | binary            | YES  |     | NULL    |       |


4 rows in set (0.00 sec)

This isn't ideal: an attribute with the name SUM(price) is difficult to reference (because it's confused by MySQL with the aggregate function SUM).

In MySQL 4.1, you can explicitly choose attribute names, types, and lengths using a variation of the previous approach, and you can create indexes. Using this method, you provide a comma-separated list of attribute names, types, lengths, and modifiers. You then add any index definitions. Here's the previous example rewritten using this approach:

CREATE TABLE salesuntilnow (cust_id int(5) NOT NULL, 

                            surname varchar(50), 

                            firstname varchar(50), 

                            totalsales decimal(5,2),

                            primary key (cust_id))

  SELECT customer.cust_id, surname, firstname, SUM(price) AS totalsales

  FROM customer INNER JOIN items USING (cust_id) GROUP BY items.cust_id;

In this example, the table that's created has the following structure (again as shown by running SHOW COLUMNS FROM salesuntilnow):


| Field      | Type         | Collation         | Null | Key | Default | Extra |


| cust_id    | int(5)       | binary            |      | PRI | 0       |       |

| surname    | varchar(50)  | latin1_swedish_ci | YES  |     | NULL    |       |

| firstname  | varchar(50)  | latin1_swedish_ci | YES  |     | NULL    |       |

| totalsales | decimal(5,2) | binary            | YES  |     | NULL    |       |


4 rows in set (0.00 sec)

Note that in the SELECT query, you must alias attributes so that they match the attribute names in the new table (if the attribute names aren't the same). In our example, we alias SUM(price) AS totalsales, so that it is stored in the totalsales attribute in the new table. If you don't include an alias, an extra attribute is added to the new table; you can use this as a feature if you want to add an attribute without defining it.

MySQL 4.1 also allows you to create a new table with exactly the same structure as an existing table. For example, to create the salesuntilyesterday table, with the exact structure of salesuntilnow (including any indexes), use:

CREATE TABLE salesuntilyesterday LIKE salesuntilnow;

This doesn't copy any data. Replacing data

The INSERT INTO ... SELECT statement inserts new data. If you want to change existing data, REPLACE or UPDATE should be used instead. If you get a complaint about duplicate primary key insertion, the problem is that you tried to INSERT where there was already a row with that key.

You can tell MySQL to ignore errors when using INSERT by including the IGNORE modifier. Consider an example, where we want to write data into the salesuntilnow table:

INSERT IGNORE INTO salesuntilnow (cust_id, surname, firstname, totalsales)

  SELECT customer.cust_id, surname, firstname, SUM(price) 

    FROM customer INNER JOIN items USING (cust_id) 

    GROUP BY items.cust_id;

This query runs without complaint, but won't insert any new row that has the same primary key as a row that's already in the salesuntilnow table. So, for example, if there's a row that has a cust_id value of 1 in the salesuntilnow table, any data returned from the SELECT statement for that customer will be silently ignored.

But what about if you want to override data using a simple INSERT statement? Let's suppose you've loaded the winestore database, and now you want to repeat the insertion of the first customer with some amended details:

INSERT INTO customer VALUES (1,'Rosenthal','John','B',1, 

  '34 Mellili Cres','Earlwood','VIC','6750',12, '(613)83008461',


If you execute the statement, MySQL complains (as it should) about a duplicate key value being used. In this example, you can solve the problem by writing an UPDATE statement to change values, or you can use the REPLACE statement instead of UPDATE:

REPLACE INTO customer VALUES (1,'Rosenthal','John','B',1,

  '34 Mellili Cres','Earlwood','VIC','6750',12, '(613)83008461',


The REPLACE statement reports:

Query OK, 2 rows affected (0.00 sec)

Two rows are reported as changed because the old row is first deleted, and then the new row is inserted. This shows you the difference between UPDATE and REPLACE: you can use UPDATE only when a row exists, but you can use REPLACE even if the row hasn't yet been created (and MySQL will just silently skip the deletion step). If you do use REPLACE instead of INSERT into an empty table, you'll find that REPLACE works the same as INSERT and reports that only one row was affected.

The REPLACE statement supports the same syntax as INSERT: all different approaches to insertion that are described in Chapter 5 work with REPLACE. Bulk loading a file into a database

A common need is to load data from a formatted ASCII text file into a database. A formatted text file is usually a comma-delimited (also known as a comma-separated) or tab-delimited file, where the values to be inserted are separated by comma or tab characters, respectively. Lines, which map to rows in a table, are usually terminated with a carriage return. For example, consider the following winery information that has been exported from a legacy spreadsheet program:

1, "Hanshaw Estates Winery", 2

2, "De Morton and Sons Wines", 5

3, "Jones's Premium Wines", 3

4, "Borg Daze Premium Wines", 5

5, "Binns Group", 6

6, "Davie Brook Vineyard", 3

7, "Eglington Creek Premium Wines", 4

8, "McKay Station Vineyard", 4

9, "Dennis and Sons Wines", 5

10, "Beard Brothers Vineyard", 4

The data in this example is saved in the file winery.csv. We've organized the attribute values into the same order as the attributes in the winestore winery table. Most spreadsheet software allows data to be reorganized and manipulated as it is exported. We've also used the spreadsheet to create unique primary key values for each row as the first attribute. If you're using a Unix platform, or avoiding spreadsheets, you'll find awk is almost the only tool you'll ever need for line-by-line data manipulation; there's also a Microsoft Windows version available.

The MySQL statement LOAD DATA INFILE is used to load formatted data from a file into a database. This is nonstandard SQL. The winery.csv file can be inserted into the winery table using the statement:

LOAD DATA INFILE 'winery.cdf' INTO TABLE winery


If quotation marks form part of an attribute, they must be escaped using backslashes. For example:

"Smith's \"Lofty Heights\" Winery"

Spreadsheet software usually automatically escapes quotation marks in strings when data is exported.

More detail on the LOAD DATA INFILE statement, including other options to specify data formats and techniques to control its priority, are discussed in Section 6.4.9 of the MySQL manual. Delayed insertion

If your application is under heavy load, you can use the MySQL-specific DELAYED modifier for insertion. It works only with the default MyISAM table type that's discussed in Section 15.6. Here's an example:

INSERT DELAYED INTO customer VALUES (1,'Rosenthal','John','B',1,

  '34 Mellili Cres','Earlwood','VIC','6750',12, '(613)83008461',


This modifier causes an INSERT statement to be stored in a buffer at the database server so that it can be run later together with any other statements that are in the insert buffer for that table. This has two advantages: first, it allows the client to continue without waiting for the query to execute; and, second, it allows MySQL to optimize the insertion process by working with many rows at once. The main drawback is that you can't get sensible information about the result of the insertion process. For example, if you execute the previous example and you already have a row with a cust_id value of 1, you'll still receive a message that indicates the process worked (even though it didn't):

Query OK, 1 row affected (0.01 sec)

In addition, this modifier is faster only if the application is under heavy load; if it isn't, don't use DELAYED because your insertion will run slower.

15.3.3 More on Deleting Data

Our discussion of the DELETE statement in Chapter 5 focused on simple examples with one table. In this section, we show you how to delete using a join query and how to delete from more than one table with a single query. We also show you a few tricks to speed up your deletes.

As with the SELECT statement, you can include a join condition in the WHERE clause of a DELETE, and you can delete rows from more than one table in a single statement. For example, suppose you want to remove all orders and related items rows, if the order was placed prior to 1 March 2000. You can do this with the following query:

DELETE orders, items FROM orders, items WHERE orders.cust_id=items.cust_id 

  AND orders.order_id=items.order_id AND orders.date < "2000/03/01";

The syntax is a little different from a SELECT statement: the table names from which rows should be deleted are listed after the DELETE statement and the tables that are used in the join condition are listed after the FROM statement. Another example shows why this is needed: let's suppose you want to delete all wineries from the Barossa Valley region (but not the region itself). Here's the query:

DELETE winery FROM region, winery WHERE winery.region_id=region.region_id 

  AND region_name = "Barossa Valley";

The query only affects the winery table, but it uses both the winery and region tables to discover which rows should be deleted.

You can also use the advanced join operators in DELETE statements. For example, our first query in this section can be rewritten using the INNER JOIN syntax as:

DELETE orders, items FROM orders INNER JOIN items 

  USING (cust_id, order_id) WHERE orders.date < "2000/03/01";

You can also use nested queries (as long as the inner query doesn't reference data that's being deleted), GROUP BY, and HAVING in DELETE statements. You can also use ORDER BY in a single-table DELETE, but that doesn't make much sense unless you're combining it with the LIMIT modifier so that only some rows are removed; ORDER BY and LIMIT can't be used with multi-table deletes.

If you're deleting all of the data from one table, there's a faster alternative than using DELETE. The TRUNCATE statement drops a table (deleting the data and the table structure), and then recreates the table structure. Here's an example:

TRUNCATE customer;

Its only significant limitation is that it doesn't report how many rows were deleted from the table. Also, it works on only one table.

You can add a QUICK modifier to a DELETE statement, but this works only with tables of the default MyISAM table type. For example:

DELETE QUICK FROM customer WHERE cust_id < 100;

The QUICK option causes lazy deletion of index entries, and this can speed up large or frequent delete operations.

If you use the MyISAM table type, an occasional clean up of the table after deletion will reduce file size and speed up subsequent queries. You can do this with the OPTIMIZE TABLE statement:


15.3.4 More on Updating Data

Our UPDATE examples in Chapter 5 are simple. In this section, we show you how to include a join condition in an update and how to avoid errors that can occur.

You can use joins in UPDATE statements. For example, here's a query that adds a note to the end of the order delivery instructions for all customers who live in the state of Western Australia (WA):

UPDATE customer, orders 

  SET instructions = CONCAT(instructions, " Ship using rail.") 

  WHERE customer.cust_id = orders.cust_id AND customer.state = "WA";

The CONCAT( ) function joins two or more strings together, and is used in this example to add the additional instruction to the end of the current instruction; it's discussed later in Section 15.4.

You can also use the INNER JOIN and LEFT JOIN clauses with an UPDATE. For example, our previous query could be rewritten as:

UPDATE customer INNER JOIN orders USING (cust_id)

  SET instructions = CONCAT(instructions, " Ship using rail.") 

  WHERE customer.state = "WA";

You can also use nested queries for updates, with the limitation that the inner query can't read data that's being updated by the outer query. There's also no problem in using GROUP BY and HAVING. What's more, you can also use ORDER BY to update rows in a specific order, but that's useful only if you're combining it with the LIMIT modifier so that only some rows are affected.

Last of all, you can add the keyword IGNORE to an update so that MySQL won't abort even if an error is encountered: this is useful if you're preparing a set of SQL statements, and want them all to run even if something goes wrong. Here's an example:

UPDATE IGNORE customer SET cust_id = 1 WHERE cust_id = 2;

In this case, because there's already another row with this unique cust_id, MySQL shows it hasn't done anything but doesn't complain either:

Query OK, 0 rows affected (0.00 sec)

Rows matched: 1  Changed: 0  Warnings: 0