6.2 New Line

In the old days, developers built applications for terminal and simple daisy-wheel feed printers. They had agreed on the ASCII standard for 7-bit text encoding, with the eighth bit reserved for system specific uses (such as character-based graphics). These developers neglected, however, to specify the precise encoding for generating a new line. Some systems used a carriage return (CR) to return the printer head to the start of a new line, and then a line feed (LF) to tell the printer to roll up the paper a line.

However, many developers decided that using two characters for a line feed was wasteful and redundant. This led to the use of either a CR or LF code (but not both) to indicate the end of a line. For these developers, the single character was sufficient to tell the printer or terminal character generator that a new line should be generated. Of course, fragmentation occurred and applications didn't always use the same line feed character, or didn't correctly interpret documents and applications that used a different character than they were programmed to interpret.

Since then, we've moved to a world of WYSIWYG and GUI, where users typically associate the return key with a new paragraph break, not a new line. Today, the Windows environment is standardized on the CR/LF value (the original double-character line feed), the Classic Mac OS is standardized on the CR value, and the Unix world on LF. As you can see, this is the worst possible scenario?three major platforms with three different line feed standards. Therefore, a Java developer doesn't know which of these bits actually renders the proper logical result. Since Java is intended to be a multiplatform language, this situation can be quite a problem.

Fortunately, Java developers have a standard mechanism that queries the system's properties for the current system's correct value:

System.getProperty("line.separator",".");

However, this mechanism doesn't help text-file users copy one system to another. Many of today's popular text editors take a "best guess" by scanning through the document until they find a CR, LF, or CR/LF sequence, and then assuming that what they find is the proper new line sequence for the file. This can lead to problems, however, if the user opens the file with one line feed syntax and then pastes in data from an application that uses a different line feed syntax.

For general text processing, the best solution is to keep track of the original line break preference of the text document, normalize the line breaks in memory to the platform standard, and then convert the output back to the original when the document is saved. You may wish to expose new line preferences to the user as well. This means that you have to work harder at opening and saving documents. Opening now involves an initial scan to get the line feed syntax, a possible conversion, and then any normal opening steps; saving involves the same process in reverse. However, your users will never notice your work (which may seem frustrating) and never have problems with your applications (which is definitely good).

You will also encounter this issue in the source files of the code you write. A variety of tools is available for dealing with this, including several programming text editors for Mac OS X and other platforms that can deal with these issues seamlessly. If you're aware of the problem, though, it's much easier to avoid.