Datatypes define the type of data that can be stored in tables. When each column is defined, it is assigned a datatype, which enforces basic integrity by limiting the type and size of the data that can be entered. Choosing the correct datatype for each column is crucial to efficient data storage, performance, and application compliance.
SQL Server provides several base datatypes, which are grouped by the type of data that they store. Table 12.2 shows the system-supplied datatypes and their storage capacities.
Data Type Category | SQL Server Data Type | Number of Bytes |
---|---|---|
Character |
char[(n)] varchar[(n)] text |
0?8000 0?2GB |
Unicode Character |
nchar[(n)] nvarchar[(n)] ntext |
0?8000 (4000 characters) 0?2GB |
Binary |
binary[(n)] varbinary[(n)] |
0?8000 |
Date and time |
datetime smalldatetime |
8 4 |
Integer |
int bigint smallint tinyint |
4 8 2 1 |
Exact Numeric |
decimal[(p[,s])] numeric[(p[,s])] |
2?17 |
Approximate Numeric |
float[(n)] real |
8 4 |
Monetary |
money smallmoney |
8 4 |
Image | image | 0?2GB |
Global Identifier | uniqueidentifier | 16 |
Special |
Bit cursor timestamp sysname table sql_variant |
1 0?8 8 256 0?8016 |
Character datatypes are used to store strings. They can be fixed or variable length single byte or Unicode, depending on which you decide to use. The six-character datatypes are char, varchar, nchar, nvarchar, text, and ntext.
The most commonly used character datatypes, char and varchar, store data in a fixed or variable length format respectively. Use char if a column has a consistent length, such as a two-character State field. If char is defined as char(20), it uses 20 bytes of storage even if only five characters are entered into the field; data is padded with blanks up to the full field size. For a variable length column, such as a Description or Name field, varchar would be a better choice because it doesn't pad the data with spaces, and the amount of bytes required is the actual length of the character data. Char and varchar can store up to 8,000 characters.
The Unicode variants are nvarchar and nchar. These have the same properties, but data is stored in Unicode format. Unicode supports a wider range of characters, but each character requires two bytes of storage. This limits the available storage to 4,000 characters.
Binary and varbinary also store up to 8,000 bytes, but are designed to store binary information. SQL Server interprets character data based on the sort order, whereas binary is simply a stream of bits. Binary datatypes are used to store hexadecimal values.
To store character strings and binary data that might exceed 8,000 bytes, use the text, ntext, and image datatypes. For example, to store large text files (.txt), you would define a column as text or ntext depending on the requirement for Unicode support. For storing images (.gif), a better choice would be the image datatype.
Although SQL Server has a page size of 8KB, text and image data up to 2GB can be stored. By default, these datatypes actually store a 16-byte pointer in the row, which maps to separate data pages where fragments of the large object are stored. SQL Server can retrieve these fragments and piece them back together when required. New for SQL Server 2000 is the table option text in row. This option allows you to specify that text, ntext, and image data below a certain size are stored inline. This means that rather than a pointer, the data is stored in the row. The default value for text in row is 256 bytes, and can range from 24 to 7,000 bytes. Storing the data inline will provide faster retrieval of the object, at the expense of increased row size.
The following statement specifies text in row to store objects less than 500 bytes inline for the orders table:
EXEC sp_tableoption 'orders', 'text in row' , '500'
The uniqueidentifier datatype is used to generate a globally unique identifier. The datatype does not generate this value; rather, it is used in conjunction with the ROWGUIDCOL property and the NEWID() function. The value generated is guaranteed to be unique across all network computers worldwide. This is useful in situations such as merge replication, in which multiple copies of the same row must be identified as unique across multiple servers.
timestamp generates an 8-byte binary number unique within the database. The primary use of timestamp is to determine row version. Although unique, timestamp is incremented with each insert, or update to a row; it is not a good candidate for a primary key. SQL Server 2000 provides a synonym for timestamp called rowversion. It is recommended that you now use rowversion because timestamp is an ANSI-92 keyword that is synonymous to datetime. Despite what its name implies, the value in a timestamp column has no correlation to an actual date and time when the row was last modified.
Date and time information is stored in the datetime and smalldatetime datatypes. The difference is in the amount of precision and the storage required. Table 12.3 compares the two datatypes.
datetime | smalldatetime | |
---|---|---|
Storage size | 8 bytes | 4 bytes |
Precision | 3/100 second | 1 minute |
Minimum value | Jan. 1, 1753 | Jan. 1, 1990 |
Maximum value | Dec. 31, 9999 | June 6, 2079 |
There are no datatypes in SQL Server that store only a date or only a time. If you insert a date value without a time into a datetime or smalldatetime field, the time will default to midnight on that date:
create table datetest (datecol datetime not null) go insert datetest values ('4/4/66') go select * from datetest
If you specify a time value without a date, the date defaults to January 1, 1900:
insert datetest values ('12:51:35') go select * from datetest
When inputting date values into a datetime or smalldatetime field, the value specified will be a character string containing a datetime value. SQL Server supports a number of different date formats:
monthname dd[,] yy[yy]
dd monthname yy[yy]
yyyy monthname dd
mm/dd/yy[yy]
mm-dd-yy[yy]
mm.dd.yy[yy]
[yy]yymmdd
When U.S. English is the default language, the default date order is month day year (mdy). You can use the SET DATEFORMAT command to change the default date format to something else like mdy or ymd.
Time values can be specified using a 12-hour or 24-hour clock as hh[:mm[:ss[: | .]xxx]] [AM | PM].
If you simply want to store on/off or true/false values, you can use the bit datatype. Valid values are 1, 0, or NULL. The bit datatype uses one byte of storage to hold the bit entry. That is not entirely wasteful because if you have multiple bit datatypes, up to 8-bit datatypes can share the same byte.