SQL: The First Look

SQL: The First Look

Throughout this book we are going to use the ACME order management database of a fictitious hardware store. For full a description of this database as well as detailed instructions on how to install it on the RDBMS of your choice (as long as you choose Oracle 9i, Microsoft SQL Server 2000, or IBM DB2 UDB 8.1) please refer to Appendixes B and F, respectively. The whirlwind tour of SQL for this chapter will be using ACME tables exclusively. We've tried to create as generic as possible syntax that would be acceptable for every RDBMS discussed in the book.

Database example

As far as SQL is concerned the database starts with the CREATE statement. It is used to create all the objects that comprise a database: tables, indices, constraints, and so on. We look into creating, altering, and destroying database objects in Chapters 4 and 5.

You start with a CREATE TABLE statement. The syntax is virtually identical across all three databases: name of the column and its data type, which defines what kind of information it will hold in the future.

		  status_id_n INT, status_code_s CHAR(2), status_desc_s VARCHAR(30) )

This statement executed against an RDBMS will create a structure — an empty table, with columns of the specified data types: status id, status code, and description.


Data types are discussed in Chapter 3.

The procedure may be repeated as many times for as many tables you wish to add to your database. The relationships between these tables will be defined through constraints. To keep this introduction simple none of the constraints (or default values) are specified here.


Constraints are discussed in Chapter 4.

To dispose of a table (or many other objects in the RDBMS) one would issue a DROP statement:


In real life, referential integrity constraints may prevent you from dropping the table outright; in the Chapter 1 we talked about what makes a database relational, in context of database integrity. You may need to disable constraints or cascade them. More about referential integrity and constraints is in Chapters 4 and 5.

A database is more than just a collection of the tables; there are many more objects as well as associated processes and structures that work together to keep and serve data; for now we can afford a more simplistic view.

Getting the data in and out

Once the tables are created, you probably would want to populate them with some data — after all, that's what databases are for. The SQL defines four basic statements, which are fairly self-explanatory, to query and manipulate data inside the database tables (Table 2-2). The exact uses of these statements will be discussed in depth in Chapters 6 through 9.

Table 2-2: Four Basic SQL Statements

SQL Statement



Adds new data to the table


Updates data — i.e., changes existing values — in the database table


Retrieves data from database table


Removes data from the table


Later in the book you will learn of data definition language (DDL) (Chapters 4 and 5), data manipulation language (DML) (Chapter 6), data query language (DQL) (Chapters 8 and 9), and data control language (DCL) (Chapter 12). These are parts of SQL proper.

To add new status for the table created in the previous example one would use the following statement:


This statement could be entered directly through RDBMS access utility (Appendix E); in that case the database usually would acknowledge insertion with a message, or would generate an error message if the insertion failed for some reason.


The values for STATUS_CODE_S and STATUS_DESC_S are enclosed in single quotes because these columns are of character data type. STATUS_ID_N is of numeric data type and does not need quotes.

If you need to change some existing data (e.g., an acquisition status code might be changed while other related data elsewhere remains the same) you would use an UPDATE statement. Again, the syntax is completely portable across the three RDBMS products used in the book — Oracle 9i Database Server, IBM DB2 UDB 8.1, and Microsoft SQL Server 2000.

UPDATE status SET status_desc_s
		  = 'APPROVED' WHERE status_id_n = 8

You update on a column basis, listing all the columns you want to update with the values you want to change; if every column in the record needs to be updated, they all must be listed with corresponding values.

UPDATE status SET status_desc_s
		  = 'APPROVED', status_code_s = '90' WHERE status_id_n = 8

The UPDATE statement has a WHERE clause to limit the number of updated rows to exactly one customer whose ID is 8; if omitted from the query, the result would be that the existing value will be replaced with a new one for each and every customer.

The same applies to deleting data. If you need to completely remove a particular customer record from your database, then you might issue the following command:

DELETE status WHERE status_id_n
		  = 8

Omitting the WHERE clause could be disastrous as all records in the table will be blown away; usually databases have some built-in mechanisms for recovering deleted data, which does not mean that you should not pay attention to what you're doing with the data.

The basic SELECT statement retrieves the data from a table or a view. You need to specify the columns you wish to be included in the resultset.


View is a virtual table that is being populated at the very moment it is queried. Views are discussed in detail in Chapter 4.

The following script selects CUSTOMER_NAME, ORDER_NUMBER and TOTAL_PRICE columns from the V_CUSTOMER_TOTALS view:

		  customer_name, order_number, total_price FROM v_customer_totals customer_name
		  order_number total_price ----------------------- ---------------- -----------
		  WILE BESS COMPANY 523720 7511.00 WILE BESS COMPANY 523721 8390.00 WILE BESS
		  COMPANY 523722 6608.00 WILE BESS
		  COMPANY 523723 11144.00 WILE ELECTROMUSICAL INC. 523726 6608.00 WILE

Slice and dice: Same data, different angle

There is always more than one way to look at data. SQL provides you with the means to manipulate data while retrieving it. You can arrange the data in ascending or descending order by virtually any column in the table; you can perform calculations while retrieving the data; you can impose some restrictions on what data should be displayed. Here are just a few examples of these capabilities.

Basic SELECT query returns a resultset based on the selection criteria you've specified. What if one would like to see, for instance, net sales figures, with state and federal taxes subtracted? You could perform fairly complex calculations within the query itself. This example is based on the view V_CUSTOMER_TOTALS, which contains columns CUSTOMER_NAME, ORDER_NUMBER, and TOTAL_PRICE.

Assuming tax at 8.5%, the query might look like follows:

SELECT customer_name,
		  order_number, (total_price–(total_price * 0.085)) net_sale FROM
		  v_customer_totals customer_name order_number net_sale ------------------------
		  ------------------- -------- WILE BESS COMPANY 523720 6872.56 WILE BESS COMPANY
		  523721 7676.85 WILE BESS COMPANY 523722 6046.32 WILE BESS COMPANY 523723
		  523727 6046.32 WILE ELECTROMUSICAL INC. 523728 6046.32

By default, every column in the returned results has it is own name, in cases of calculated columns like the one in the example above, RDBMS will use the whole string (SALE_AMOUNT-(SALE_AMOUNT * 0.085) as the name. For readability, you may substitute this unwieldy string for something more descriptive using alias — in our case NET_SALE.

The results returned by the query include calculated values; in other words, the original data from the table has been transformed. SQL also provides several useful functions that could be used in the query to manipulate data as it is being extracted.


Chapter 10 gives an in-depth description of all popular functions and their uses across three RDBMS vendors (IBM, Microsoft, and Oracle); Appendix G lists virtually all SQL functions.

With SQL, you have full control over how data is displayed: you could order it by any column — alphabetically or numerically. Let's say that you want the list of your companies and sales arranged according to the amount of sales for each customer's order.

		  customer_name, order_number, (total_price–(total_price * 0.085)) net_sale FROM
		  v_customer_totals ORDER BY net_sale customer_name order_number net_sale
		  ------------------------ -------------- -------- WILE ELECTROMUSICAL INC.
		  523728 6046.32 WILE BESS COMPANY 523722 6046.32 WILE BESS COMPANY 523720
		  6872.56 WILE BESS COMPANY 523721 7676.85 WILE BESS COMPANY 523723

Now you want to see the customers with the most sales at the top of the list; use DESC modifier (stands for descending); default order is ascending.

		  customer_name, order_number, (total_price–(total_price * 0.085)) net_sale FROM
		  v_customer_totals ORDER BY net_sale DESC customer_name order_number net_sale
		  ------------------------ -------------- -------- WILE BESS COMPANY 523723
		  10196.76 WILE BESS COMPANY 523721 7676.85 WILE BESS COMPANY 523720 6872.56 WILE
		  BESS COMPANY 523722 6046.32 WILE ELECTROMUSICAL INC. 523726 6046.32 WILE


Using SQL, you could transform your data while retrieving it. For instance, you need to know the total sum of the sales. Let's assume your database contains the V_CUSTOMER_TOTALS view with the information for all the sales you've had up to date; now you need to sum it up.

To find out your total for all orders across all products, you would use the SQL built-in SUM function; it will simply add up all the amounts it finds in the TOTAL_PRICE column of the view.

SELECT SUM(total_price)
		  net_sale_total FROM v_custome_totals net_sale_total --------------

To find out the average size of the orders, you would run this query, using AVG aggregate function on the TOTAL_PRICE column:

SELECT AVG(total_price)
		  net_sale_average FROM v_custome_totals net_sale_average ----------------

In the real life you would want even further limit the query by requesting the average sales for a particular customer or date range.

Using other predicates like GROUP_BY and HAVING, one could sum NET_SALE by customer, or date, or product, and so on; grouping allows for aggregating within a group.

We used these samples just to give you a sense of what could be accomplished using SQL and RDBMS.


See Chapter 10 for more on SQL functions, including aggregate functions.

Data security

SQL provides a number of built-in mechanisms for data security. It is fine-grained, though granularity greatly depends on the particular implementation. Essentially, it comes down to granting access on the object level: ability to connect and view a particular table or set of tables, execute particular command (e.g., ability to view data — execute SELECT statement, while lacking privileges to INSERT new data).


Assuming that there is a user JOHN_DOE defined in the database, to grant a permission to this user, the following SQL statement could be used:

			 v_custome_totals TO john_doe

To grant SELECT and UPDATE simultaneously one could use this syntax:

			 v_custome_totals TO john_doe

To revoke this privilege:

			 v_custome_totals FROM john_doe 

Here is the syntax to quickly revoke all privilege from JOHN_DOE:

			 v_custome_totals FROM john_doe


One of the common mechanisms for implementing security is using views. A view is a way to limit the data accessible to a user. You may think of a view as a virtual table: It could join columns from several tables in the database, limit the number of columns available for viewing, and so on. It does not contain any data but fetches it on demand whenever a SELECT statement is executed against it. For most practical purposes, selecting from a view is identical to selecting from a table.

For example, the view V_CUSTOMER_TOTALS collects information from the CUSTOMER, ORDER_HEADER, ORDER_LINE, and PRODUCT tables, while summing and grouping some data along the way.


To see the full SQL syntax for creating view V_CUSTOMER_TOTALS, please refer to Appendix B.


Some views limit access to underlying data (e.g., no UPDATE or INSERT statements could be executed). Views are discussed in Chapter 4.

There is much more to the security than discussed here; for example, all three RDBMS discussed in this book implement role-based security, where individual users are assigned to a particular role (e.g., accountants) and all the privileges are granted to the role.


For a comprehensive discussion of the SQL security features, see Chapter 12.

Accessing data from a client application

A wide range of client applications is being used to access RDBMS data. They all use SQL to do that in two radically different ways.

Embedded SQL allows users to create programs that can access RDBMS through SQL statements embedded in an ANSI/ISO standard host programming language such as C or COBOL. That means that you program an application in the standard programming language and switch to SQL only where there is a need to use a database. Usually vendors provide special development tools to create such applications.

Dynamic SQL is all the embedded SQL is and then some more. The major difference is that dynamic SQL is not blended into some programming language, but is rather built on the fly dynamically and passed to RDBMS as a simple text. This allows for flexibility that embedded SQL cannot possibly have: there is no need in hard-coded tables names or column names, or even database name — it all could be changed.


Embedded and dynamic SQL are discussed in Chapter 15.

New developments

The SQL99 standard also reflected a concept that had been evolving for quite some time — online analytical processing (OLAP). It was neither a new nor an obscure concept. The OLAP council was established in 1995, and most of the analysis was done manually since then. It soared in popularity with the advent of data warehousing — another database-related concept.

OLAP is about making sense out of data: analyzing complex trends, viewing the data under a variety of different angles; it transforms raw data into a multidimensional view, crafted to the user's perspective. OLAP queries could answer more intelligent questions than plain SQL, asking, for example, "what would be the effects of burger sales if prices of beef rise by 10 cents a pound?" These kinds of problems were usually solved with custom-made proprietary applications; SQL99 introduced built-in support for it. OLAP products complement the RDBMS, and all three major vendors (IBM, Oracle, and Microsoft) support it with some proprietary extensions. There are over 30 vendors on the market supplying RDBMS-based OLAP solutions.

Another relatively new feature supported by SQL is eXtensible Markup Language (XML). XML is all about data exchange. It is often called a self-describing format as it represents data in hierarchical structure, which, coupled with eXtensible Stylesheet Language (XSL), provides for visual representation via a browser or serves as a data format exchange between several parties.

Because it is an open standard that is not locked in by one particular vendor, it will be eventually supported by all vendors, enabling truly universal data interoperability. One of the major strengths of XML is that it could be transferred using HTTP protocol — the very protocol of the Internet — thus making any proprietary networks obsolete; it could be encrypted for better security or sent over the Secure Socket Layer (SSL). This versatility comes at a price — XML is inherently slower than compiled code, being a text that needs to be parsed and interpreted each time.