The Intermediate Language

The Common Language Runtime is an implementation of a virtual execution system, or virtual machine. Like all virtual machines, the CLR has its own abstract microprocessor. As already mentioned, the assembly language of the virtual processor is called Common Intermediate Language (CIL), although before being promoted as a standard it was called Microsoft Intermediate Language (MSIL), a terms you'll still see around often. Compilers that target the CLR do not generate code in the native instruction set of any specific, real microprocessor. Instead, .NET compilers target the abstract processor of the CLR. The hardware abstraction built into the CLR hints at some cross-platform viability. Remember, Microsoft's CLR is but one implementation of the CLI; any hardware/operating system platform that has a CLI-compliant execution environment built for it could be targeted.

Of course, there is no real microprocessor that can execute CIL directly, so it must be compiled to the instruction set of the native hardware prior to execution. This is the job of the Just in Time (JIT) compiler. Here is where the CLR differs from other virtual machine implementations (like Java). The CLR is not an interpreter, nor does it execute bytecode. On the .NET platform, CIL is always compiled to native CPU instructions, and once compiled it is cached in memory; so chances are good that it will never have to be recompiled.

Note

In some memory-constrained environments (such as a PDA), compiled code can be discarded. In this case, the code would need to be recompiled if it was ever reloaded.

Compiling IL is not a very expensive operation (MS Research has spent years developing technology to allow the JIT compilation to be as negligible as possible) but does imply a little overhead and it must be repeated every time you run even the same program. Most applications will see a little increase in startup time (what's particularly slow is loading all the .NET framework itself with the first .NET application run in a Windows session); however, this is limited because not all code in an application is compiled at once. The JIT compiler works in conjunction with the loader, and so IL code is not compiled until it is called (on a method by method basis).

JIT compilation is the normal case on the .NET platform, but it is possible to compile a managed executable into native instructions and store the native image on disk. By doing so, you avoid the negative impact of JIT compilation on your application's startup time. The .NET Framework runtime contains a utility called the Native Image Generator (Ngen.exe) to accomplish this.

The native image created by Ngen is stored in the native image cache. The next time the CLR tries to load the assembly, it looks in the native image cache first. If a native image of the assembly is found, it is preferred over the IL version. Note that you must also deploy the IL version of the assembly, because the native image does not contain any metadata. In addition, the end user or administrator could remove your native image from the cache. In this case, there would be no native image to find, and the CLR would revert back to the usual JIT compilation of the IL version.

The ability to generate a native executable image can be helpful, but you should profile your application under both environments (JIT and native image) to see whether the detrimental effects of the JIT compiler are that bad. The JIT compiler is, after all, a real compiler for a specific microprocessor, and as such it can do some performance optimizations of its own. It employs good algorithms to reduce its overhead and also introduces optimizations to the compiled code (somewhat like Delphi does in its compilations).

Looking over the IL code generated by your compiler can be highly educational. Microsoft provides a utility called the IL Disassembler (ildasm.exe) that you can use to dissect your assemblies at the lowest level. Located in the bin directory of your .NET Framework SDK installation, ildasm can load any assembly and its metadata: the manifest, the classes, their methods and properties, and, of course, the IL code generated by the compiler. We will look more at ILDASM in this chapter and the next. The Reflector tool mentioned earlier is also useful in this regard.

Managed and Safe Code

Simply put, managed code is any code that is loaded, JIT compiled, and executed under the auspice of the CLR. Like all executable and library modules on the Windows platform, managed code modules are stored in Microsoft's Portable Executable (PE) file format. A managed PE file contains additional header information and, when loaded, jumps into the runtime's execution engine (to a function in MSCorEE.dll).

The runtime initializes, and then it looks for the module's entry point. The IL code in the entry point is JIT compiled to native CPU instructions. Finally, execution begins at the module's entry point. The situation is similar for a library module; the PE file directs the loader to jump to a different function in MSCorEE.dll.

In contrast, unmanaged code consists not of IL, but rather of traditional, native CPU instructions. Unmanaged code executes outside of the runtime and therefore can't take advantage of the services provided by the CLR—at least not without special measures.

Unmanaged code can create .NET Framework classes using COM Interop services. The .NET Framework class is wrapped in a COM proxy and exposed to unmanaged code as if it were a COM object. The COM Interop bridge goes the other way too, allowing a COM objects in a COM server to be accessed by managed code. Finally, the Platform Invoke services of the CLR allow managed code to call the Win32 API directly.

Note

The Delphi for .NET Preview compiler produces fully managed code. There is currently no support for mixing managed and unmanaged (native) code within the same module, as you can with Microsoft's Visual C++ .NET (which uses a mechanism called IJW: It Just Works).

A module is completely self-describing, because it contains both IL code and metadata that describes the data elements used by the code. Taking the IL code and the metadata together, the CLR can perform another level of verification beyond the static checking done by the compiler. This process, which is always performed unless a system administrator turns it off, verifies that code is type safe. Verifiably type-safe code is known as safe code. Safe code passes the following type-safety checks:

Only valid operations are invoked on objects. This includes parameter validation, return type verification, and visibility checks.
Objects are always assigned to compatible types.
The code uses no explicit pointers, as they might refer to invalid memory locations.

As you'd expect, unsafe code fails to pass these checks. But, just because code is not verifiably type safe does not mean it is unsafe; it simply means the code could not be verified, either due to a limitation in the verification process, or perhaps in the compiler itself. When it will be released as a finished product, the Delphi for .NET compiler is expected to generate verifiably type-safe code.

Some Delphi language constructs are not CLS compliant, but this is different than not being verifiably type safe. Non-CLS-compliant language constructs will be covered more in Chapter 25.

The .NET Framework SDK contains a PEVerify utility that exhaustively analyzes a managed PE for type safety (peverify.exe). The Delphi 7 IDE plug-in mentioned previously lets you automatically run PEVerify on your code after every build.

The Common Type System

The Common Type System (CTS) is the bulldozer that levels the playing field for programming languages in the .NET framework. The CTS fully specifies the primitive types and object types known to the CLR. These types are used to define an object model that is shared among all languages that target the CLR.

The Component Object Model (COM) has been the usual way to achieve binary compatibility and language interoperability on the Windows platform. The CTS goes beyond that, allowing languages as different as Eiffel, C#, and Delphi to integrate with each other. Components written in these disparate languages can pass objects among themselves and directly extend their capabilities through inheritance. This level of integration of programming languages is unprecedented.

All types defined by the CTS fall into two categories: value types and reference types. Value types, as their name implies, have pass-by-value semantics. For example, say you have a variable that is a value type. If you pass this variable as a parameter to a function and modify the parameter within the function, the original variable will be unaffected. Examples of value types include scalar types, enumerations, and records. Aggregate types such as Delphi records (or C# structures) are known as value classes within the CTS.

On the other hand, reference types have alias, or pass-by-reference, semantics. If you have a variable that is a reference type (for example, an instance of a class) and you pass that variable as a parameter to a function, any changes you make to the parameter will also affect the variable. Examples of reference types include class types and interfaces. Pointer types are also reference types, as are delegates, which will be discussed shortly.

Objects and Properties

Like Delphi, the CTS implements a single-inheritance model. A class must inherit from one and only one ancestor and may declare itself to be an implementer of zero or more interfaces. Other familiar object-oriented attributes of the CTS are private, public, and protected visibility of classes and class members (with other visibility specifiers available, as discussed in the next chapter). These CTS visibility specifiers have meanings similar to those in Delphi; however, they are more restricted, in line with C++ semantics. (In Chapter 25 we'll look at how Delphi's visibility specifiers map to the CTS versions, and examine the specific changes made to the language thus far to accommodate additional features of the CTS.)

As you peruse the .NET literature, you will notice similarities between the capabilities of CTS class types and Delphi classes. The traditional object-oriented features of fields and methods are supported, of course. In addition, the CTS implements properties in a way that is conceptually similar to the familiar Delphi notion. Properties in CTS can have read and write access methods that restrict or compute values on the fly, or they can simply mask private fields. However there are also many differences, including the fact that property get/set methods must have the same visibility as the property itself to ensure languages that don't support property syntax can still access the property. Although Delphi doesn't enforce this in the source code, it modifies compiled code behind the scenes if necessary.

Events and Delegates

One reason the Win32 API has survived so long is that at its lowest levels it is based on fundamental concepts, such as using the address of a function as a callback mechanism. The entire Windows user interface event system is based on callback functions (and some of events in the VCL framework are built on top of even that system). The callback mechanism is so powerful that it surely must make its way into the CTS. Using callbacks in a type-safe, language-neutral way relies on a reference type called a delegate.

CTS delegates are different from ordinary function pointers, in that they can reference both static and instance methods of a class. The declaration of a delegate must match the signature of the methods the delegate will reference. In Delphi for .NET, usage of delegates is similar to that of familiar procedural types:

type
  TMyClass = class
  public
    procedure myMethod;
  end;
   
var
  threadDelegate: System.Threading.ThreadStart;
  tmc: TMyClass;
  aThread: System.Threading.Thread;
   
begin
  tmc := TMyClass.Create;
  threadDelegate := @tmc.myMethod;
  aThread := Thread.Create (threadDelegate);
  aThread.Start;

The threadDelegate variable is of type System.Threading.ThreadStart, which is a CLR delegate class. Methods you assign to a delegate have a signature matching the delegate's, which in this case is a procedure taking no parameters. (You can find this code snippet in the Delegate sample folder.)

The compiler is hiding a lot of complexity here. Behind the scenes, the compiler must create an instance of the class System.MulticastDelegate. The delegate—the function being encapsulated (myMethod in this case)—is invoked using methods on the MulticastDelegate class. This class supports the simultaneous encapsulation of multiple functions in a single delegate. In a user interface event model, this translates to having multiple listeners for an event.

Note

Incidentally, the fact that delegates are classes tells you why you can declare a delegate outside the scope of a class. Because delegates are instances of System.MulticastDelegate (or a compiler-generated descendant class), you can declare them anywhere you can declare a class.

Each specific language compiler implements some form of semantic sugar to make the creation of events and the addition and removal of event listeners less painful. For example, in C#, Microsoft used the += and -= operators to add and remove functions from the underlying delegate. Delphi for .NET uses set semantics for the same purpose. In Chapter 25 we'll explore this topic, showing how the functions Include and Exclude are used to assign event handlers. We'll also look at how Delphi's := assignment operator works, with regard to assigning event handlers in the .NET universe.