The physical and logical format used by writable CDs is defined in the rainbow books described in Chapter 10. The following sections provide an overview of how data is physically and logically stored on writable CDs. For further detail, refer to the rainbow books.
|
Because they must be readable in a standard CD-ROM drive or CD player, writable CDs use a physical format nearly identical to pressed CDs. The dimensions of a CD are 120.00mm in diameter (60.00mm radius) with a 15.00mm diameter central hole that accommodates the rotating center spindle of the drive. Beginning at the edge of the center hole (radius 7.50mm) and proceeding outward, a CD-R disc is divided into the following areas:
The Clamping Area is that portion of the disc that the drive spindle grasps to rotate the disc. On a pressed CD, this area extends from radius 7.50mm to 23.00mm. On a writable CD, this area occupies radius 7.50mm to 22.35mm.
The System Use Area (SUA) is present only on writable discs, occupies radius 22.35mm to 23.00mm, and can be thought of as equivalent to the boot sector of a hard disk. The SUA contains data that tells a CD drive or player what kind of information is stored on the disc, where it is located, and what format it uses. The SUA is inside the radius readable by standard CD-ROM drives and CD players, so only CD recorders can read and write to this area. The SUA is divided into two subareas:
The Optimal Power Calibration Area (OPCA), often called the Power Calibration Area (PCA) for short, is used by the CD writer as a testing area to decide the best write schema to use when writing to that disc. Each time you insert a disc into a CD-R drive, the drive fires its writing LASER at the PCA to calibrate that disc against the drive. Each such calibration uses one ATIP frame. Only 99 PCA ATIP frames are available, which limits a CD-R disc to 99 or fewer recording sessions.
Many variables determine how the drive should best write to that disc?the type of dye and reflective backing material the disc uses, the proposed write speed, the firmware level of the drive, and so on. From this calibration testing, the drive decides the power level to use when writing, and whether to use a short write schema (typical for cyanine-based discs) or a long write schema (typical for pthalocyanine- and azo-based discs). The PCA begins at radius 22.35mm (ATIP -00:00:36 relative to the 23.00mm beginning of the Lead-in Area described later in this section).
The Program Memory Area (PMA) begins where the PCA ends, and extends to the beginning of the Lead-in Area at radius 23.00mm. The PMA is used to store a temporary TOC until the disc is finalized or closed. Closing a disc writes the temporary TOC stored in the PMA to the Lead-in Area. That makes the TOC (and therefore the disc) readable by a CD-ROM drive or CD player, but also means that the disc can no longer be written to by a CD recorder. The PMA can store location information for up to 99 track numbers, including the start and stop times for each track (for audio) or the sector addresses for data.
The Information Area (IA) occupies a width of 35.00mm to 35.50mm, beginning at radius 23.00mm and ending between radius 58.00mm and 58.50mm. This area provides the general storage space to which user data is written. The IA is the only area of the CD that is visible to standard CD-ROM drives and CD players, and includes the following subareas:
The Lead-in Area occupies radius 23.0mm to 25.0mm on both pressed and writable CDs. This area contains digital silence in the main channel, as well as control information in various subcode channels that can be used to provide additional information to the drive or reader about the content of the disc. The most important of the subcode channel data is the TOC for the disc, which is stored in the Q-channel. The length of the Lead-in Area is determined by the space required to store up to 99 TOCs for the 99 tracks that may potentially be written to the Program Area.
|
The Program Area (PA) occupies a width of 33.00mm to 33.50mm, beginning at radius 25.00mm and ending between radius 58.00mm and 58.50mm. The PA is where actual user data (audio or computer data) is stored. The PA varies in capacity according to the CD-R disc you use. Discs are available that store 63 minutes of audio (which corresponds to about 600 MB of data), 74 minutes (~650 MB), and 80 minutes (~700 MB). Different brands of discs also have minor variations from nominal capacity. Some nominally 74-minute discs, for example, can store as much as 76.5 minutes.
The Lead-out Area occupies a radius of 0.50mm to 1.00mm, which begins between radius 58.00mm and 58.50mm, and ends between radius 59.00mm and 59.50mm. The Lead-out Area is created when the disc is closed, and defines the end of the Information Area.
The remaining 0.50mm to 1.00mm at the outer edge of the disc is unused. This area has no formal name that we know of, and exists simply to protect the outer portion of the track from damage.
The preceding assumes that the data on the disc exists as one session, which is nearly always true for commercially-pressed CDs, as well as for writable CDs produced using Disc-at-Once recording (described in a later section). But Orange Book defines a concept called multisession for CD-R discs.
With multisession recording, the overall disc layout remains the same. As with a single-session disc, a multisession disc contains a Lead-in Area, a Program Area, and a Lead-out Area. The difference is that the Program Area on a multisession disc stores more than one session, each of which contains its own session-based Lead-in Area, Program Area, and Lead-out Area.
Like the disc itself, a session can be opened, written to, and closed. When a session is closed, that session can no longer be written to, but additional sessions can be added to the disc. In fact, closing a session on a multisession disc automatically opens a new session to which additional data can be written. Closing the session writes the session TOC to the PMA. This session TOC includes pointers to the start of the session Program Area for the new session and to the start time of the last-used (outermost) Lead-out Area.
Closing the session does not close the disc, however, which means that until the disc itself is closed, sessions on a multisession disc can be read only by a CD recorder (which can read the temporary TOC in the PMA) and by some recent CD-ROM drives. When the disc itself is closed, all sessions are closed and the temporary TOC is written to the Lead-in Area, allowing the disc to be read in any CD-ROM drive and most CD players.
|
The logical format of a CD specifies how data is arranged on the CD, and largely determines how data may be structured on the disc and what operating systems will be able to access it. CDs commonly use one of the logical formats described in the following sections.
Most data CDs use the ISO-9660 format or one of its variants. ISO-9660 is based on the de facto standard High Sierra format that was developed by the CD-ROM industry as a cooperative effort because of the lack of formal standards that then existed for writing data to CDs. In the days before High Sierra came into use, it was quite common to find that you could not read the data on a particular CD-ROM because that CD was incompatible with your software.
The primary purpose of ISO-9660, which was adopted in 1984, was to standardize a common logical data format for data CDs and, at the same time, to facilitate data exchange among different computing platforms. As a least-common-denominator format, the original ISO-9660 format is feature-poor because it supports only features that are common across many platforms. For example, the MS-DOS 8.3 filenaming convention limited ISO-9660 to using 8.3 filenames.
At the time ISO-9660 was adopted, these limitations were not much of a problem. Most people ran either MS-DOS or a Mac using floppy disks or small hard disks, and the limitations of ISO-9660 were not onerous in those environments. But the world soon changed, and the strict limits enforced by ISO-9660 became a problem, particularly for those who wanted to use deeply nested directories and long filenames. Accordingly, the ISO-9660 specification was expanded to include three ISO-9660 Interchange Levels for naming files and directories on disc. From most to least restrictive, these include:
ISO-9660 Level 1 is the least-common-denominator level, developed to accommodate DOS filename limitations. Each file must be written to disc as a single, continuous stream of bytes, called an extent. Files may not be fragmented or interleaved. Filenames may contain from one to eight d-characters (see following section). Filename extensions may contain from zero to three d-characters. Directory names may contain from one to eight d-characters, and may not have an extension.
ISO-9660 Level 2 also requires that files be written to disc as a single extent, but filenames may be up to 255 d-characters long, with an extension from zero to three d-characters. ISO-9660 Level 2 discs are unreadable by some operating systems, notably DOS.
ISO-9660 Level 3 allows a file to be written in multiple extents, and so is used for packet writing. Filenames may be up to 255 characters long, with the same limitations as ISO-9660 Level 2.
|
The various ISO-9660 levels vary significantly in which characters are legal. In ISO-9660-speak, these characters are designated as follows:
For strict compliance with ISO-9660 Level 1 file and directory naming conventions, only this character set may be used (and only in 8.3 format). d-characters include uppercase A through Z, digits 0 through 9, and the underscore character.
The character set usable for ISO Volume Descriptors (discussed next). a-characters include all d-characters as well as the following symbols: space; comma; semicolon; colon; period; question mark; exclamation point; right and left parentheses; single and double quotes; greater-than and less-than symbols; percent; ampersand; equals; asterisk; plus and minus (hyphen) symbols; and forward slash.
ISO-9660 Volume Descriptors are optional information fields recorded at the beginning of the data area on the disc. Volume Descriptors were originally intended for use by CD publishers, but may be used by anyone who creates an ISO-9660 disc, assuming the mastering software supports assigning ISO Volume Descriptors (some don't, or support only some of the available volume descriptors). ISO-9660 Volume Descriptors include the following, with allowable sizes in parentheses:
The operating system for which the disc is intended (0 to 32 a-characters).
The disc name, displayed by the OS when the disc is mounted (0 to 32 a-characters).
Used in multidisc sets to assign a common group name to each disc in the set (0 to 32 d-characters).
The publisher of the disc (0 to 128 a-characters).
The author of the disc content (0 to 128 a-characters).
The name of the program, if any, needed to access data on the disc (0 to 128 a-characters).
Points to a file (which, if present, must reside in the root directory of the disc) that contains copyright information (maximum 8.3 d-characters).
Points to a file (which, if present, must reside in the root directory of the disc) that contains text describing the contents of the disc (maximum 8.3 d-characters).
Points to a file (which, if present, may reside in any directory on the disc) that contains bibliographic information, such as ISBN number (maximum 8.3 d-characters).
Four Volume Descriptor fields exist for dates: Creation Date; Modification Date; Expiration Date; and Effective Date. Each of these fields, if present, stores a date and time in the following format, with size given in bytes in parentheses: Year (4); Month (2); Day (2); Hour (2); Minute (2); Second (2); Hundredths of a second (2); Timezone (1 byte, signed integer; specifies the number of 15-minute increments from UCT from -48 West to +52 East).
The very real limitations of ISO-9660 formatted discs gave rise to several alternative formats, all of which were based on ISO-9660:
The Rock Ridge format is an extension of the ISO-9660 format, intended for use on Unix systems, which have much more liberal restrictions on the length of and characters used in filenames and directory names, as well as the depth of directories. Using Rock Ridge allows a CD to support long mixed-case filenames, symbolic links, and other conventions common to Unix systems. Although full Rock Ridge support is available only on Unix systems, a system running MS-DOS, Windows, or the Mac OS can still access the data on a Rock Ridge disc, but not the long filenames and other extended information. The Rock Ridge standard is available at ftp://ftp.ymi.com/pub/rockridge if you want to learn more about it.
The Romeo format is an obsolete extension to ISO-9660, developed by Adaptec as a stopgap measure for early versions of its EasyCD premastering software. The raison d'être for the cutely named Romeo format was that Windows NT 3.5a did not support the proprietary Microsoft Joliet format, described next. Romeo supports filenames of up to 128 characters, including spaces. However, unlike Joliet, Romeo supports neither the Unicode character set nor associated short (MS-DOS 8.3) filenames. Romeo-formatted discs can be read under Windows NT 3.51 and 4.0, Windows 98/SE/Me, and Windows 2000/XP. Because there is no associated short filename, Romeo-formatted discs cannot be read under MS-DOS. Romeo-formatted discs can be read on a Macintosh to the extent that they do not use filenames that exceed 31 characters. The Romeo format was essentially overtaken by events, was seldom used even when current, and is almost never encountered today.
Joliet is an extension of ISO-9660, developed by Microsoft to allow CDs to support long filenames, the Unicode character set, and associated short (MS-DOS 8.3) filenames. Joliet allows filenames up to 64 characters, including spaces. When read on a system running Windows 9X, Windows NT 4, Windows 2000/XP, or recent releases of Linux, a Joliet-formatted disc displays long file and directory names. When read on a system running an operating system that does not support Microsoft long filename standards, the Joliet-formatted disc is recognized as a standard ISO-9660 disc. Full information about the Joliet standard is available at http://www-plateau.cs.berkeley.edu/people/chaffee/jolspec.html.
|
ISO-9660 and its variants were designed for duplicating or premastering discs, but were never intended to allow incrementally adding small amounts of data to a disc. Although ISO-9660 allows adding data to a disc (until that disc has been closed), the only way to do so is by opening a new session on that disc. That means that writing even one new file incurs the overhead required for a new session, which ranges from 13 MB to 22 MB.
In part to address these ISO-9660 limitations, OSTA defined a new logical format for optical discs. The official designation of this format is ISO 13346 but the common name is Universal Disc Format (UDF). UDF is an operating system-independent logical formatting standard that defines how data is written to various types of optical discs, including CD-R, CD-(M)RW, DVD-ROM, DVD-Video, and DVD-Audio. UDF uses a redesigned directory structure that allows small amounts of data (called packets) to be written incrementally and individually to disc without incurring the large overhead associated with writing a new session under ISO-9660.
In effect, with UDF each packet is written as a subsession within a standard session, incurring the standard session overhead only when that standard session is closed. Packet-writing software typically closes the session automatically when the disc is ejected using the eject feature of the software. As with ISO-9660, an open session on a UDF-formatted disc can be read only by a CD recorder. Closing the session allows the disc to be read by a standard CD-ROM drive or CD player. It's possible, however, subsequently to open a new session and add additional packet data to the disc.
In addition to session overhead, UDF addresses another issue that makes ISO-9660 completely inappropriate for packet writing. ISO-9660 must know, in advance, exactly which files are to be written during a session. It uses this information to create and write the Path Tables and Primary Volume Descriptors that point to the physical locations of the files on disc. Because packet writing allows any arbitrarily selected file to be written to disc at any time, the information that ISO-9660 requires is not available before the write occurs.
UDF solves this problem by accumulating data about the physical locations of files as they are written. At the end of a packet-writing session, UDF consolidates these location pointers and writes them to disc as the Virtual Allocation Table (VAT). The VAT address of a file remains the same, even if it is overwritten. At the end of each packet-writing session, UDF creates a new VAT that includes not just the pointers for newly created or modified files, but also the pointers stored in the old VAT. That means the current VAT always includes pointers to every file that has been written to the disc since it was originally formatted.
|
Two versions of UDF are in common use:
UDF 1.02 was adopted in August 1996, and is the finalized version of the October 1995 UDF 1.0 specification. UDF 1.02 specifies standards for DVD and DVD-ROM, but does not support writable optical media. Windows NT 4, Windows 98/SE/Me, and Windows 2000/XP include native UDF 1.02 support that allows them to access DVD video and DVD-ROM discs natively.
UDF 1.5 was adopted in February 1997, and addresses the requirements of sequential recorded media, including CD-R, CD-RW, and DVD-RAM. UDF 1.5 adds the VAT that is analogous to the DOS File Allocation Table, and, optionally, the Sparing Table that allows bad sectors to be marked as unusable and replaced by spare sectors. Windows 2000/XP includes native UDF 1.5 support, but Windows NT and Windows 9X do not. You can download UDF 1.5 reader software for these versions of Windows from http://www.roxio.com.
The UDF 2.0 and 2.01 specifications are available, but not yet commonly used in commercial products. For more information about UDF, see http://www.osta.org.