Section 28.1. Sharing Partitions

As we've explained in the section "Mounting Filesystems" in Chapter 10, partitions on local hard disks are accessed by mounting them onto a directory in the Linux filesystem. To be able to read and write to a specific filesystem, the Linux kernel needs to have support for it.

28.1.1. Filesystems and Mounting

Linux has filesystem drivers that can read and write files on the traditional FAT filesystem and the newer VFAT filesystem, which was introduced with Windows 95 and supports long filenames. It also can read and (with some caveats) write to the NTFS filesystem of Windows NT/2000/XP.

In "Building a New Kernel" in Chapter 18, you learned how to build your own kernel. In order to be able to access DOS (used by MS-DOS and Windows 3.x) and VFAT (used by Windows 95/98/ME) partitions, you need to enable DOS FAT fs support in the File systems section during kernel configuration. After you say yes to that option, you can choose MSDOS fs support and VFAT (Windows-95) fs support. The first lets you mount FAT partitions, and the second lets you mount FAT32 partitions.

If you want to access files on a Windows NT partition that carries an NTFS filesystem, you need another driver. Activate the option NTFS filesystem support during the kernel configuration. This lets you mount NTFS partitions by specifying the file system type ntfs. Note, however, that the current NTFS driver supports just read-only access. There is a version of this driver available that supports writing as well, but at the time of this writing, it was still under development, and not guaranteed to work reliably when writing to the NTFS partition. Read the documentation carefully before installing and using it!

While Linux is running, you can mount a Windows partition like any other type of partition. For example, if the third partition on your first IDE hard disk contains your Windows 98 installation, you can make the files in it accessible with the following command, which must be executed as root:

# mount -t vfat /dev/hda3 /mnt/windows98

The /dev/hda3 argument specifies the disk drive corresponding to the Windows 98 disk, and the /mnt/windows98 argument can be changed to any directory you've created for the purpose of accessing the files. But how do you know that you need (in this case) /dev/hda3? If you're familiar with the naming conventions for Linux filesystems, you'll know that hda3 is the third partition on the hard disk that is the master on the primary IDE port. You'll find life easier if you write down the partitions while you are creating them with fdisk, but if you neglected to do that, you can run fdisk again to view the partition table.

The filesystem drivers support a number of options that can be specified with the -o option of the mount command. The mount(8) manual page documents the options that can be used, with sections that explain options specific to the fat and ntfs filesystem types. The section for fat applies to both the msdos and vfat filesystems, and there are two options listed there that are of special interest.

The check option determines whether the kernel should accept filenames that are not permissible on MS-DOS and what it should do with them. This applies only to creating and renaming files. You can specify three values for check. relaxed lets you do just about everything with the filename. If it doesn't fit into the 8.3 convention of MS-DOS files, the filename will be truncated accordingly. normal, the default, will also truncate the filenames as needed, and also removes special characters such as * and ? that are not allowed in MS-DOS filenames. Finally, strict forbids both long filenames and the special characters. To make Linux more restrictive with respect to filenames on the partition mounted in our example, the mount command could be used as follows:

# mount -o check=strict -t msdos /dev/sda5 /mnt/dos

This option is used with msdos filesystems only; the restrictions on filename length do not apply to vfat filesystems.

The conv option can be useful, but not as commonly as you might at first think. Windows and Unix systems have different conventions for how a line ending is marked in text files. Windows uses both a carriage return and a linefeed character, whereas Unix only uses a linefeed. Although this does not make the files completely illegible on the other system, it can still be a bother. To tell the kernel to perform the conversion between Windows and Unix text-file styles automatically, pass the mount command the option conv, which has three possible values: binary, the default, does not perform any conversion; text converts every file; and auto tries to guess whether the file in question is a text file or a binary file. auto does this by looking at the filename extension. If this extension is included in the list of "known binary extensions," it is not converted; otherwise, it will be converted.

It is not generally advisable to use text, because this will invariably damage any binary files , including graphics files and files written by word processors, spreadsheets, and other programs. Likewise, auto can be dangerous, because the extension-based detection mechanism is not very sophisticated. So we suggest you don't use the conv option unless you are sure the partition contains only text files. Stick with binary (the default) and convert your files manually on an as-needed basis. See "File Translation Utilities," later in this chapter, for directions on how to do this.

As with other filesystem types, you can mount MS-DOS and NTFS filesystems automatically at system bootup by placing an entry in your /etc/fstab file. For example, the following line in /etc/fstab mounts a Windows 98 partition onto /win:

/dev/hda1    /win   vfat   defaults,umask=002,uid=500,gid=500    0  0

When accessing any of the msdos, vfat, or ntfs filesystems from Linux, the system must somehow assign Unix permissions and ownerships to the files. By default, ownerships and permissions are determined using the user ID and group ID, and umasking of the calling process. This works acceptably well when using the mount command from the shell, but when run from the boot scripts, it will assign file ownerships to root, which may not be desired. In the previous example, we use the umask option to specify the file and directory creation mask the system will use when creating files and directories in the filesystem. The uid option specifies the owner (as a numeric user ID, rather than a text name), and the gid option specifies the group (as a numeric group ID). All files in the filesystem will appear on the Linux system as having this owner and group. Since dual-boot systems are generally used as workstations by a single user, you will probably want to set the uid and gid options to the user ID and group ID of that user's account.

28.1.2. File Translation Utilities

One of the most prominent problems when it comes to sharing files between Linux and Windows is that the two systems have different conventions for the line endings in text files. Luckily, there are a few ways to solve this problem:

If you access files on a mounted partition on the same machine, let the kernel convert the files automatically, as described in "Filesystems and Mounting" earlier in this chapter. Use this with care!
When creating or modifying files on Linux, common editors such as Emacs and vi can handle the conversion automatically for you.
There are a number of tools that convert files from one line-ending convention to the other. Some of these tools can also handle other conversion tasks as well.
Use your favorite programming language to write your own conversion utility.

If all you are interested in is converting newline characters, writing programs to perform the conversions is surprisingly simple. To convert from DOS format to Unix format, replace every occurrence of <CR><LF> (\r\f or \r\n) in the file to a newline (\n). To go the other way, convert every newline to a <CR><LF>. For example, we show you two Perl programs that do the job. The first, which we call d2u, converts from DOS format to Unix format:

#!/usr/bin/perl
while (<STDIN>) { s/\r$//; print }

And the following program (which we call u2d) converts from Unix format to DOS format:

#!/usr/bin/perl
while (<STDIN>) { s/$/\r/; print }

Both commands read the input file from the standard input, and write the output file to standard output. You can easily modify our examples to accept the input and output filenames on the command line. If you are too lazy to write the utilities yourself, you can see if your Linux installation contains the programs dos2unix and unix2dos, which work similarly to our simple d2u and u2d utilities, and also accept filenames on the command line. Another similar pair of utilities is fromdos and todos. If you cannot find any of these, then try the flip command, which is able to translate in both directions.

If you find these simple utilities underpowered, you may want to try recode, a program that can convert just about any text-file standard to any other.

The most simple way to use recode is to specify both the old and the new character sets (encodings of text-file conventions) and the file to convert. recode will overwrite the old file with the converted one; it will have the same filename. For example, to convert a text file from Windows to Unix, you would enter:

recode ibmpc:latin1 textfile

textfile is then replaced by the converted version. You can probably guess that to convert the same file back to Windows conventions, you would use:

recode latin1:ibmpc textfile

In addition to ibmpc (as used on Windows) and latin1 (as used on Unix), there are other possibilities available, such as latex for the LAT_EX style of encoding diacritics and texte for encoding French email messages. You can get the full list by issuing:

recode -l

If you do not like recode's habit of overwriting your old file with the new one, you can make use of the fact that recode can also read from standard input and write to standard output. To convert dostextfile to unixtextfile without deleting dostextfile, you could use:

recode ibmpc:latin1 < dostextfile > unixtextfile

With the tools just described, you can handle text files quite comfortably, but this is only the beginning. For example, pixel graphics on Windows are usually saved as bmp files. Fortunately, there are a number of tools available that can convert bmp files to graphics file formats, such as png or xpm, that are more common on Unix. Among these are the GIMP, which is probably included with your distribution.

Things are less easy when it comes to other file formats, such as those saved by office productivity programs. Although the various incarnations of the .doc file format used by Microsoft Word have become a de facto lingua franca for word processor files on Windows, it was until recently almost impossible to read those files on Linux. Fortunately, a number of software packages have appeared that can read (and sometimes even write) .doc files. Among them are the office productivity suite KOffice, the freely available OpenOffice.org, and the commercial StarOffice 6.0, a close relative to OpenOffice.org. Be aware, though, that these conversions will never be perfect; it is very likely that you will have to manually edit the files afterward. Even on Windows, conversions can never be 100% correct; if you try importing a Microsoft Word file into WordPerfect (or vice versa), you will see what we mean.

In general, the more common a file format is on Windows, the more likely it is that Linux developers will provide a means to read or even write it. Another approach might be to switch to open file formats, such as Rich Text Format (RTF) or Extensible Markup Language (XML), when creating documents on Windows. In the age of the Internet, where information is supposed to float freely, closed, undocumented file formats are an anachronism.