5.5 Querying with SQL SELECT

The SELECT statement is used to query and retrieve one or more rows from a database. We introduce it in this section, and then show you the WHERE clause for selecting data that matches a condition. The section concludes with an introduction to the more advanced features of SELECT statements and a short case study.

5.5.1 Basic Querying

Consider an example SELECT statement:

SELECT surname, firstname FROM customer;

This outputs the values of the attributes surname and firstname from all rows in the customer table. Assuming we previously inserted four rows when we created the winestore database, the output from the MySQL command interpreter is:

+-----------+-----------+

| surname   | firstname |

+-----------+-----------+

| Marzalla  | Dimitria  |

| LaTrobe   | Anthony   |

| Fong      | Nicholas  |

| Stribling | James     |

+-----------+-----------+

4 rows in set (0.04 sec)

Any attributes of a table may be listed in a SELECT statement by separating them with a comma. If all attributes are required, the shortcut of an asterisk character (*) can be used. Consider the statement:

SELECT * FROM region;

This outputs all the data from the table region:

+-----------+---------------------+

| region_id | region_name         |

+-----------+---------------------+

|         1 | All                 |

|         2 | Goulburn Valley     |

|         3 | Rutherglen          |

|         4 | Coonawarra          |

|         5 | Upper Hunter Valley |

|         6 | Lower Hunter Valley |

|         7 | Barossa Valley      |

|         8 | Riverland           |

|         9 | Margaret River      |

|        10 | Swan Valley         |

+-----------+---------------------+

10 rows in set (0.01 sec)

SELECT statements can also output data that isn't from a database. Consider the following example:

SELECT curtime( );

This example runs a function that displays the current time:

+-----------+

| curtime( ) |

+-----------+

| 08:41:50  |

+-----------+

1 row in set (0.02 sec)

The SELECT statement can even be used as a simple calculator, using the MySQL mathematical functions described in Chapter 15:

SELECT pi( )*(4*4);

This outputs:

+------------+

| pi( )*(4*4) |

+------------+

|  50.265482 |

+------------+

1 row in set (0.01 sec)

5.5.2 WHERE Clauses

A WHERE clause is used as part of most SELECT queries to limit the rows that are retrieved to those that match a condition.

Consider this grape-growing region table containing the details of ten regions:

mysql> SELECT * from region;

+-----------+---------------------+

| region_id | region_name         |

+-----------+---------------------+

|         1 | All                 |

|         2 | Goulburn Valley     |

|         3 | Rutherglen          |

|         4 | Coonawarra          |

|         5 | Upper Hunter Valley |

|         6 | Lower Hunter Valley |

|         7 | Barossa Valley      |

|         8 | Riverland           |

|         9 | Margaret River      |

|        10 | Swan Valley         |

+-----------+---------------------+

10 rows in set (0.09 sec)

To show only the first three regions, you can type:

SELECT * FROM region WHERE region_id <= 3;

This outputs all attributes for the first three rows:

+-----------+-----------------+

| region_id | region_name     |

+-----------+-----------------+

|         1 | All             |

|         2 | Goulburn Valley |

|         3 | Rutherglen      |

+-----------+-----------------+

3 rows in set (0.03 sec)

You can combine the attribute and row restrictions and select only the region_name attribute for the first three regions:

mysql> SELECT region_name FROM region WHERE region_id <= 3;

+-----------------+

| region_name     |

+-----------------+

| All             |

| Goulburn Valley |

| Rutherglen      |

+-----------------+

3 rows in set (0.01 sec)

The SQL Boolean operators AND and OR have the same function as the PHP && and || operators introduced in Chapter 2. These can be used to develop more complex WHERE clauses (and these can be combined with the MySQL functions described in Chapter 15). Consider an example query:

SELECT * FROM customer WHERE surname='Marzalla' AND firstname='Dimitria';

This retrieves rows that match both criteria, that is, those customers with a surname Marzalla and a firstname Dimitria. In this example, you need to be careful to type the strings 'Marzalla' and 'Dimitria' using the correct case because string values are case sensitive.

Consider a more complex example:

SELECT cust_id FROM customer 

  WHERE (surname='Marzalla' AND firstname LIKE 'M%') OR 

    birth_date='1980-07-14';

This finds rows with either the surname Marzalla and a firstname beginning with M, or customers who were born on 14 July 1980; the LIKE operator is discussed in more detail in Chapter 15. The OR operator isn't exclusive, so a row can contain a birth date of 14 July 1980, a surname of Marzalla, and a firstname beginning with M. This query, when run on the winestore database, returns:

+---------+

| cust_id |

+---------+

|     440 |

|     493 |

+---------+

2 rows in set (0.01 sec)

SELECT queries are often sophisticated and a long WHERE clause may include many AND and OR operators. More complex examples of queries are shown later in this chapter. As discussed previously, the WHERE clause is also a common component of UPDATE and DELETE statements.

5.5.3 Sorting and Grouping Output

Listing attributes in the SELECT statement and using WHERE allows you to decide what rows and columns in a table are returned from a query. However, you might also want to sort the data after it's returned, or you might want to group it together beforehand so that you can count the number of rows with different values, find a minimum or maximum value, or sum a numeric field. This section shows you how to pre- and post-process your data.

5.5.3.1 ORDER BY

The ORDER BY clause sorts the data after the query has been evaluated. Consider an example:

SELECT surname, firstname FROM customer

  WHERE city = 'Portsea' and firstname = 'James' ORDER by surname;

This query finds all customers who live in Portsea and who have the first name James. It then presents the results sorted alphabetically by ascending surname:

+-----------+-----------+

| surname   | firstname |

+-----------+-----------+

| Leramonth | James     |

| Mockridge | James     |

| Ritterman | James     |

+-----------+-----------+

3 rows in set (0.00 sec)

Sorting can be on multiple attributes. For example:

SELECT surname, firstname, initial FROM customer 

  WHERE city = 'Coonawarra' OR city = 'Longwood' 

  ORDER BY surname, firstname, initial;

This presents a list of customers who live in Coonawarra or Longwood, sorted first by ascending surname, then (for those customers with the same surname) by firstname, and (for those customers with the same surname and first name), by initial. The output for the winestore customer table is:

+------------+-----------+---------+

| surname    | firstname | initial |

+------------+-----------+---------+

| Archibald  | Belinda   | Q       |

| Chester    | Marie     | S       |

| Dalion     | Marie     | C       |

| Eggelston  | Martin    | E       |

| Florenini  | Melinda   | O       |

| Holdenson  | Jasmine   | F       |

| Mellaseca  | Craig     | Y       |

| Mockridge  | Dimitria  | I       |

| Morfooney  | Chris     | K       |

| Nancarral  | Samantha  | W       |

| Oaton      | Joel      | V       |

| Oaton      | Rochelle  | F       |

| Patton     | Joel      | Z       |

| Patton     | Penelope  | E       |

| Patton     | Samantha  |         |

| Rosenthal  | Chris     | A       |

| Tonkin     | Michelle  | Z       |

| Tonnibrook | Belinda   | T       |

+------------+-----------+---------+

18 rows in set (0.00 sec)

By default, the ORDER BY clause sorts in ascending order, or ASC. To sort in reverse or descending order, DESC can be used. Consider an example:

SELECT * FROM customer WHERE city='Melbourne' ORDER BY surname DESC;

5.5.3.2 GROUP BY

The GROUP BY clause is different from ORDER BY because it doesn't sort the data for output. Instead, it sorts the data early in the query process, for the purpose of grouping or aggregation . Grouping data using a sort is the easiest way to discover properties such as maximums, minimums, averages, and counts of values.

Consider an example:

SELECT city, COUNT(*) FROM customer GROUP BY city;

This query first sorts the rows in the customer table by city and groups the rows with matching values together. The output of the query consists of two columns. The first is a sorted list of unique cities. The second shows, for each city, the COUNT of the number of customers who live in that city. The number of rows that are output is equal to the number of different city values in the customer table, and the effect of COUNT(*) is to count the number of rows per group.

Here are the first few lines output by the query:

+--------------+----------+

| city         | COUNT(*) |

+--------------+----------+

| Alexandra    |       14 |

| Armidale     |        7 |

| Athlone      |        9 |

| Bauple       |        6 |

| Belmont      |       11 |

| Bentley      |       10 |

| Berala       |        9 |

| Broadmeadows |       11 |

So, for example, there are 14 customers who live in Alexandra, that is, 14 rows in the customer table are grouped together because they have a city value of Alexandra.

The GROUP BY clause can find different properties of the aggregated rows. Here's an example:

SELECT city, MIN(birth_date) FROM customer GROUP BY city;

This query first groups the rows by city and then shows the oldest customer in each city. The first few rows of the output are as follows:

+---------------+-----------------+

| city          | MIN(birth_date) |

+---------------+-----------------+

| Alexandra     | 1938-04-01      |

| Armidale      | 1943-04-04      |

| Athlone       | 1943-04-04      |

| Bauple        | 1922-11-26      |

The GROUP BY clause should be used only when the query is designed to find a characteristic of a group of rows, not the details of individual rows.

There are several functions that can be used in aggregation with the GROUP BY clause. Five particularly useful functions are:

AVG( ): Finds the average value of a numeric attribute in a set
MIN( ): Finds a minimum value of a string or numeric attribute in a set
MAX( ): Finds a maximum value of a string or numeric attribute in a set
SUM( ): Finds the sum total of a numeric attribute
COUNT( ): Counts the number of rows in a set

The SQL standard places a constraint on the GROUP BY clause that MySQL doesn't enforce. In the standard, all attributes that are selected (those that are listed immediately after the SELECT statement) must appear in the GROUP BY clause. Most examples in this chapter don't meet this unnecessary constraint.

5.5.3.3 HAVING

The HAVING clause permits conditional aggregation of data into groups. For example, consider the following query:

SELECT city, count(*), min(birth_date) FROM customer 

  GROUP BY city HAVING count(*) > 10;

The query groups rows by city, but only for cities that have more than 10 resident customers. For those groups, the city, count of customers, and earliest birth date of a customer in that city is output. Cities with less than 10 customers are omitted from the result set. The first few rows of the output are as follows:

+--------------+----------+-----------------+

| city         | count(*) | min(birth_date) |

+--------------+----------+-----------------+

| Alexandra    |       14 | 1938-04-01      |

| Belmont      |       11 | 1938-04-01      |

| Broadmeadows |       11 | 1955-10-13      |

| Doveton      |       13 | 1943-04-04      |

| Eleker       |       11 | 1938-04-01      |

| Gray         |       12 | 1943-04-04      |

The HAVING clause must contain an attribute or expression (such as a function or an alias) from the SELECT clause; in this example, count(*) is listed after the SELECT and is used in the HAVING condition.

The HAVING clause should be used exclusively with the GROUP BY clause. It is slow and should never be used instead of a WHERE clause. For example, don't do this:

SELECT cust_id, surname FROM customer HAVING surname = "Leramonth";

Do this instead:

SELECT cust_id FROM customer WHERE surname = "Leramonth";

5.5.3.4 Combining clauses

You can combine ORDER BY, GROUP BY, HAVING, and WHERE. When all four are used, they must appear in the order WHERE, then GROUP BY, then HAVING, and then ORDER BY. This is intuitive because the WHERE clause picks the rows from the table, then GROUP BY organizes the rows into sets, then HAVING picks the sets that match a condition, and then the data is sorted by the ORDER BY condition just before it's output.

Consider an example. Suppose we want to find the number of customers with the same name who live in each city in the state of Victoria, where the same name is defined as the same first name and surname. For example, this might determine that there are five John Smiths who live in Inverloch and three Tuong Nguyens in Carlton. Here's the query:

SELECT city, surname, firstname, count(*) FROM customer

  WHERE state = 'VIC'

  GROUP BY surname, firstname HAVING count(*) >= 2

  ORDER BY city;

The query first uses the WHERE clause to pick the rows of customers that live in the state of Victoria. The rows are then grouped together into sets, where the grouping condition is that the customer surname and firstname are the same. Then, only those sets that have more than one customer with the same name are kept by the HAVING clause; this gets rid of unique names. Last, the ORDER BY clause sorts the customers by their city, and the city, first name, surname, and count of the number of customers is output. Here is the output from the winestore customer table:

+--------------+-----------+-----------+----------+

| city         | surname   | firstname | count(*) |

+--------------+-----------+-----------+----------+

| Broadmeadows | Mellaseca | Anthony   |        2 |

| Eleker       | Leramonth | Harry     |        2 |

| Kalimna      | Galti     | Nicholas  |        2 |

| Lucknow      | Mellili   | Derryn    |        2 |

| McLaren      | Chester   | Betty     |        2 |

+--------------+-----------+-----------+----------+

5 rows in set (0.00 sec)

The output shows, for example, that there are two Betty Chesters who live in McLaren city in the state of Victoria.

The GROUP BY clause sorts before it groups the rows into sets. Therefore, you don't need to use ORDER BY if you want the data to be output in the sort order used by the GROUP BY. For example, you don't need to do this:

SELECT * FROM customer GROUP BY surname ORDER BY surname;

If you leave out the ORDER BY clause, you'll get the same output:

SELECT * FROM customer GROUP BY surname;

However, in practice, it doesn't really matter: the MySQL query optimizer will ignore the ORDER BY clause if it's unnecessary. We discuss the query optimizer in Chapter 15.

5.5.3.5 DISTINCT

Suppose we want to find out which different cities our customers live in. The following query shows the cities for all of the customers:

SELECT city FROM customer;

The problem is that a city name appears more than once if more than one customer lives in that city. What we really want is a list of unique cities that the customers live in.

The DISTINCT clause presents only one example of each identical row from a query. We can use it to find out the unique cities the customers live in:

SELECT DISTINCT city FROM customer;

This shows one example of each different city in the customer table.

This example has exactly the same result as:

SELECT ciry FROM customer GROUP BY city;

The DISTINCT clause is often slow to run, much like the GROUP BY and HAVING clauses. We discuss how indexes and query optimization can speed queries in Chapter 15.

5.5.4 Limiting Output in MySQL

The LIMIT operator is MySQL-specific and is used to control the size of the output. For example, the following query returns only the first five rows from the customer table:

SELECT * FROM customer LIMIT 5;

This saves query evaluation time and reduces the size of the result set that's buffered in memory by MySQL. It's particularly useful in a web database application where one page of results is presented from a large table.

You can also specify which row to begin at, and then how many rows you want: