2.1 Basic Language Elements

Like any other programming language, the Java programming language is defined by grammar rules that specify how syntactically legal constructs can be formed using the language elements, and by a semantic definition that specifies the meaning of syntactically legal constructs.

Lexical Tokens

The low-level language elements are called lexical tokens (or just tokens for short) and are the building blocks for more complex constructs. Identifiers, numbers, operators, and special characters are all examples of tokens that can be used to build high-level constructs like expressions, statements, methods, and classes.

Identifiers

A name in a program is called an identifier. Identifiers can be used to denote classes, methods, variables, and labels.

In Java an identifier is composed of a sequence of characters, where each character can be either a letter, a digit, a connecting punctuation (such as underscore _), or any currency symbol (such as $, ¢, ¥, or £). However, the first character in an identifier cannot be a digit. Since Java programs are written in the Unicode character set (see p. 23), the definitions of letter and digit are interpreted according to this character set.

Identifiers in Java are case sensitive, for example, price and Price are two different identifiers.

Examples of Legal Identifiers:

number, Number, sum_$, bingo, $$_100, mål, grüß

Examples of Illegal Identifiers:

48chevy, all@hands, grand-sum

The name 48chevy is not a legal identifier as it starts with a digit. The character @ is not a legal character in an identifier. It is also not a legal operator so that all@hands cannot not be interpreted as a legal expression with two operands. The character - is also not a legal character in an identifier. However, it is a legal operator so grand-sum could be interpreted as a legal expression with two operands.

Keywords

Keywords are reserved identifiers that are predefined in the language and cannot be used to denote other entities. All the keywords are in lowercase, and incorrect usage results in compilation errors.

Keywords currently defined in the language are listed in Table 2.1. In addition, three identifiers are reserved as predefined literals in the language: the null reference and the Boolean literals true and false (see Table 2.2). Keywords currently reserved, but not in use, are listed in Table 2.3. All these reserved words cannot be used as identifiers. The index contains references to relevant sections where currently defined keywords are explained.

Table 2.1. Keywords in Java
`abstract`	`default`	`implements`	`protected`	`throw`
`assert`	`do`	`import`	`public`	`throws`
`boolean`	`double`	`instanceof`	`return`	`transient`
`break`	`else`	`int`	`short`	`try`
`byte`	`extends`	`interface`	`static`	`void`
`case`	`final`	`long`	`strictfp`	`volatile`
`catch`	`finally`	`native`	`super`	`while`
`char`	`float`	`new`	`switch`
`class`	`for`	`package`	`synchronized`
`continue`	`if`	`private`	`this`

Table 2.2. Reserved Literals in Java
`null`	`true`	`false`

Table 2.3. Reserved Keywords not Currently in Use
`const`	`goto`

Literals

A literal denotes a constant value, that is, the value a literal represents remains unchanged in the program. Literals represent numerical (integer or floating-point), character, boolean or string values. In addition, there is the literal null that represents the null reference.

Table 2.4. Examples of Literals
Integer	2000 0 -7
Floating-point	3.14 -3.14 .5 0.5
Character	'a' 'A' '0' ':' '-' ')'
Boolean	true false
String	"abba" "3.14" "for" "a piece of the action"

Integer Literals

Integer data types are comprised of the following primitive data types: int, long, byte, and short (see Section 2.2).

The default data type of an integer literal is always int, but it can be specified as long by appending the suffix L (or l) to the integer value. Without the suffix, the long literals 2000L and 0l will be interpreted as int literals. There is no direct way to specify a short or a byte literal.

In addition to the decimal number system, integer literals can also be specified in octal (base 8) and hexadecimal (base 16) number systems. Octal and hexadecimal numbers are specified with 0 and 0x (or 0X) prefix respectively. Examples of decimal, octal and hexadecimal literals are shown in Table 2.5. Note that the leading 0 (zero) digit is not the uppercase letter O. The hexadecimal digits from a to f can also be specified with the corresponding uppercase forms (A to F). Negative integers (e.g. -90) can be specified by prefixing the minus sign (-) to the magnitude of the integer regardless of number system (e.g., -0132 or -0X5A). Number systems and number representation are discussed in Appendix G. Java does not support literals in binary notation.

Table 2.5. Examples of Decimal, Octal, and Hexadecimal Literals
Decimal	Octal	Hexadecimal
`8`	`010`	`0x8`
`10L`	`012L`	`0XaL`
`16`	`020`	`0x10`
`27`	`033`	`0x1B`
`90L`	`0132L`	`0x5aL`
`-90`	`-0132`	`-0X5A`
`2147483647` (i.e., 2³¹`-1`)	`017777777777`	`0x7fffffff`
`-2147483648` (i.e., -2³¹)	`-020000000000`	`-0x80000000`
`1125899906842624L` (i.e., 2⁵⁰)	`040000000000000000L`	`0x4000000000000L`

Floating-point Literals

Floating-point data types come in two flavors: float or double.

The default data type of a floating-point literal is double, but it can be explicitly designated by appending the suffix D (or d) to the value. A floating-point literal can also be specified to be a float by appending the suffix F (or f).

Floating-point literals can also be specified in scientific notation, where E (or e) stands for Exponent. For example, the double literal 194.9E-2 in scientific notation is interpreted as 194.9*10^-2 (i.e., 1.949).

Examples of `double` Literals

0.0       0.0d       0D
0.49      .49        .49D
49.0      49.        49D
4.9E+1    4.9E+1D    4.9e1d   4900e-2  .49E2

Examples of `float` Literals

0.0F      0f
0.49F     .49F
49.0F     49.F       49F
4.9E+1F   4900e-2f   .49E2F

Note that the decimal point and the exponent are optional and that at least one digit must be specified.

Boolean Literals

The primitive data type boolean represents the truth-values true or false that are denoted by the reserved literals true or false, respectively.

Character Literals

A character literal is quoted in single-quotes ('). All character literals have the primitive data type char.

Characters in Java are represented by the 16-bit Unicode character set, which subsumes the 8-bit ISO-Latin-1 and the 7-bit ASCII characters. In Table 2.6, note that digits (0 to 9), upper-case letters (A to Z), and lower-case letters (a to z) have contiguous Unicode values. Any Unicode character can be specified as a four-digit hexadecimal number (i.e., 16 bits) with the prefix \u.

Table 2.6. Examples of Unicode Values
Character Literal	Character Literal using Unicode value	Character
`' '`	`'\u0020'`	`Space`
`'0'`	`'\u0030'`	`0`
`'1'`	`'\u0031'`	`1`
`'9'`	`'\u0039'`	`9`
`'A'`	`'\u0041'`	`A`
`'B'`	`'\u0042'`	`B`
`'Z'`	`'\u005a'`	`Z`
`'a'`	`'\u0061'`	`a`
`'b'`	`'\u0062'`	`b`
`'z'`	`'\u007a'`	`z`
`'Ñ'`	`'\u0084'`	`Ñ`
`'å'`	`'\u008c'`	`å`
`'ß'`	`'\u00a7'`	`ß`

Escape Sequences

Certain escape sequences define special character values as shown in Table 2.7. These escape sequences can be single-quoted to define character literals. For example, the character literals '\t' and '\u0009' are equivalent. However, the character literals '\u000a' and '\u000d' should not be used to represent newline and carriage return in the source code. These values are interpreted as line-terminator characters by the compiler, and will cause compile time errors. One should use the escape sequences '\n' and '\r', respectively, for correct interpretation of these characters in the source code.

Table 2.7. Escape Sequences
Escape Sequence	Unicode Value	Character
`\b`	`\u0008`	Backspace (BS)
`\t`	`\u0009`	Horizontal tab (HT or TAB)
`\n`	`\u000a`	Linefeed (LF) a.k.a., Newline (NL)
`\f`	`\u000c`	Form feed (FF)
`\r`	`\u000d`	Carriage return (CR)
`\'`	`\u0027`	Apostrophe-quote
`\"`	`\u0022`	Quotation mark
`\\`	`\u005c`	Backslash

We can also use the escape sequence \ddd to specify a character literal by octal value, where each digit d can be any octal digit (0?7), as shown in Table 2.8. The number of digits must be three or fewer, and the octal value cannot exceed \377, that is, only the first 256 characters can be specified with this notation.

Table 2.8. Examples of Escape Sequence `\ddd`
Escape Sequence `\ddd`	Character Literal
`'\141'`	`'a'`
`'\46'`	`'&'`
`'\60'`	`'0'`

String Literals

A string literal is a sequence of characters, which must be quoted in quotation marks and which must occur on a single line. All string literal are objects of the class String (see Section 10.5, p. 407).

Escape sequences as well as Unicode values can appear in string literals:

"Here comes a tab.\t And here comes another one\u0009!                    (1)
"What's on the menu?"                                                     (2)
"\"String literals are double-quoted.\""                                  (3)
"Left!\nRight!"                                                           (4)

In (1), the tab character is specified using the escape sequence and the Unicode value respectively. In (2), the single apostrophe need not be escaped in strings, but it would be if specified as a character literal('\''). In (3), the double apostrophes in the string must be escaped. In (4), we use the escape sequence \n to insert a newline. Printing these strings would give the following result:

Here comes a tab.    And here comes another one    !
What's on the menu?
"String literals are double-quoted."
Left!
Right!

One should also use the string literals "\n" and "\r", respectively, for correct interpretation of the characters "\u000a" and "\u000d" in the source code.

White Spaces

A white space is a sequence of spaces, tabs, form feeds, and line terminator characters in a Java source file. Line terminators can be newline, carriage return, or carriage return-newline sequence.

A Java program is a free-format sequence of characters that is tokenized by the compiler, that is, broken into a stream of tokens for further analysis. Separators and operators help to distinguish tokens, but sometimes white space has to be inserted explicitly as separators. For example, the identifier classRoom will be interpreted as a single token, unless white space is inserted to distinguish the keyword class from the identifier Room.

White space aids not only in separating tokens, but also in formatting the program so that it is easy for humans to read. The compiler ignores the white spaces once the tokens are identified.

Comments

A program can be documented by inserting comments at relevant places. These comments are for documentation purposes and are ignored by the compiler.

Java provides three types of comments to document a program:

A single-line comment: // ... to the end of the line
A multiple-line comment: /* ... */
A documentation (Javadoc) comment: /** ... */

Single-line Comment

All characters after the comment-start sequence // through to the end of the line constitute a single-line comment.

// This comment ends at the end of this line.
int age;        // From comment-start sequence to the end of the line is a comment.

Multiple-line Comment

A multiple-line comment, as the name suggests, can span several lines. Such a comment starts with /* and ends with */.

/*  A comment
    on several
    lines.
*/

The comment-start sequences (//, /*, /**) are not treated differently from other characters when occurring within comments, and are thus ignored. This means trying to nest multiple-line comments will result in compile time error:

/*  Formula for alchemy.
    gold = wizard.makeGold(stone);
    /* But it only works on Sundays. */
*/

The second occurrence of the comment-start sequence /* is ignored. The last occurrence of the sequence */ in the code is now unmatched, resulting in a syntax error.

Documentation Comment

A documentation comment is a special-purpose comment that when placed before class or class member declarations can be extracted and used by the javadoc tool to generate HTML documentation for the program. Documentation comments are usually placed in front of classes, interfaces, methods and field definitions. Groups of special tags can be used inside a documentation comment to provide more specific information. Such a comment starts with /** and ends with */:

/**
 *  This class implements a gizmo.
 *  @author K.A.M.
 *  @version 2.0
 */

For details on the javadoc tool, see the documentation for the tools in the Java 2 SDK.

Chapter 1. Basics of Java Programming

Chapter 2. Language Fundamentals

2.1 Basic Language Elements

Review Questions

2.2 Primitive Data Types

Review Questions

2.3 Variable Declarations

2.4 Initial Values for Variables

Review Questions

2.5 Java Source File Structure

Review Questions

2.6 The 'main()' Method

Review Questions

Chapter Summary

Programming Exercises

Chapter 3. Operators and Assignments

Chapter 4. Declarations and Access Control

Chapter 5. Control Flow, Exception Handling, and Assertions

Chapter 6. Object-oriented Programming

Chapter 7. Nested Classes And Interfaces

Chapter 8. Object Lifetime

Chapter 9. Threads

Chapter 10. Fundamental Classes

Chapter 11. Collections and Maps

Appendix A. Taking the SCPJ2 1.4 Exam

Appendix B. Objectives for the SCPJ2 1.4 Exam

Appendix C. Objectives for the Java 2 Platform Upgrade Exam

Appendix D. Annotated Answers to Review Questions

Appendix E. Solutions to Programming Exercises

Appendix F. Mock Exam

Appendix G. Number Systems and Number Representation

Appendix H. About the CD

Single-User License Agreement

2.1 Basic Language Elements

Lexical Tokens

Identifiers

Examples of Legal Identifiers:

Examples of Illegal Identifiers:

Keywords

Table 2.1. Keywords in Java

Table 2.2. Reserved Literals in Java

Table 2.3. Reserved Keywords not Currently in Use

Literals

Table 2.4. Examples of Literals

Integer Literals

Table 2.5. Examples of Decimal, Octal, and Hexadecimal Literals

Floating-point Literals

Examples of double Literals

Examples of float Literals

Boolean Literals

Character Literals

Table 2.6. Examples of Unicode Values

Escape Sequences

Table 2.7. Escape Sequences

Table 2.8. Examples of Escape Sequence \ddd

String Literals

White Spaces

Comments

Single-line Comment

Multiple-line Comment

Documentation Comment

Examples of `double` Literals

Examples of `float` Literals

Table 2.8. Examples of Escape Sequence `\ddd`