2.4 Assemblies and Manifests

As we just saw, types must expose their metadata to allow tools and programs to access them and benefit from their services. Metadata for types alone is not enough. To simplify software plug-and-play and configuration or installation of the component or software, we also need metadata about the component that hosts the types. Now we'll talk about .NET assemblies (deployable units) and manifests (the metadata that describes the assemblies).

2.4.1 Assemblies Versus Components

During the COM era, Microsoft documentation inconsistently used the term component to mean a COM class or a COM module (DLLs or EXEs), forcing readers or developers to consider the context of the term each time they encountered it. In .NET, Microsoft has addressed this confusion by introducing a new concept, assembly, which is a software component that supports plug-and-play, much like a hardware component. Theoretically, a .NET assembly is approximately equivalent to a compiled COM module. In practice, an assembly can contain or refer to a number of types and physical files (including bitmap files, .NET PE files, and so forth) that are needed at runtime for successful execution. In addition to hosting IL code, an assembly is a basic unit of versioning, deployment, security management, side-by-side execution, sharing, and reuse, as we discuss next.

To review: an assembly is a logical DLL or EXE, and a manifest is a detailed description (metadata) of an assembly, including its version, what other assemblies it uses, and so on.

2.4.2 Unique Identities

Type uniqueness is important in RPC, COM, and .NET. Given the vast number of GUIDs in COM (application, library, class, and interface identifiers), development and deployment can be tedious because you must use these magic numbers in your code and elsewhere all the time. In .NET, you refer to a specific type by its readable name and its namespace. Since a readable name and its namespace are not enough to be globally unique, .NET guarantees uniqueness by using unique public/private key pairs. All assemblies that are shared (called shared assemblies) by multiple applications must be built with a public/private key pair. Public/private key pairs are used in public-key cryptography. Since public-key cryptography uses asymmetrical encryption, an assembly creator can sign an assembly with a private key, and anyone can verify that digital signature using the assembly creator's public key. However, because no one else will have the private key, no other individual can create a similarly signed assembly.

To sign an assembly digitally, you must use a public/private key pair to build your assembly. At build time, the compiler generates a hash of the assembly files, signs the hash with the private key, and stores the resulting digital signature in a reserved section of the PE file. The public key is also stored in the assembly.

To verify the assembly's digital signature, the CLR uses the assembly's public key to decrypt the assembly's digital signature, resulting in the original, calculated hash. In addition, the CLR uses the information in the assembly's manifest to dynamically generate a hash. This hash value is then compared with the original hash value. These values must match, or we must assume that someone has tampered with the assembly.

Now that we know how to sign and verify an assembly in .NET, let's talk about how the CLR ensures that a given application loads the trusted assembly with which it was built. When you or someone else builds an application that uses a shared assembly, the application's assembly manifest will include an 8-byte hash of the shared assembly's public key. When you run your application, the CLR dynamically derives the 8-byte hash from the shared assembly's public key and compares this value with the hash value stored in your application's assembly manifest. If these values match, the CLR assumes that it has loaded the correct assembly for you.[6]

[6] You can use the .NET Strong (a.k.a., Shared) Name (sn.exe) utility to generate a new key pair for a shared assembly. Before you can share your assembly, you must register it in the Global Assembly Cache, or GACyou can do this by using the .NET Global Assembly Cache Utility (gacutil.exe). The GAC is simply a directory called Assembly located under the Windows (%windir%) directory.

2.4.3 IL Code

An assembly contains the IL code that the CLR executes at runtime (see Section 2.5 later in this chapter). The IL code typically uses types defined within the same assembly, but it also may use or refer to types in other assemblies. Although nothing special is required to take advantage of the former, the assembly must define references to other assemblies to do the latter, as we will see in a moment. There is one caveat: each assembly can have at most one entry point, such as DllMain( ), WinMain( ), or Main( ). You must follow this rule because when the CLR loads an assembly, it searches for one of these entry points to start assembly execution.

2.4.4 Versioning

There are four types of assemblies in .NET:

Static assemblies

These are the .NET PE files that you create at compile time. You can create static assemblies using your favorite compiler: csc, cl, vjc, or vbc.

Dynamic assemblies

These are PE-formatted, in-memory assemblies that you dynamically create at runtime using the classes in the System.Reflection.Emit namespace.

Private assemblies

These are static assemblies used by a specific application.

Shared assemblies

These are static assemblies that must have a unique shared name and can be used by any application.

An application uses a private assembly by referring to the assembly using a static path or through an XML-based application configuration file. Although the CLR doesn't enforce versioning policieschecking whether the correct version is usedfor private assemblies, it ensures that an application uses the correct shared assemblies with which the application was built. Thus, an application uses a specific shared assembly by referring to the specific shared assembly, and the CLR ensures that the correct version is loaded at runtime.

In .NET, an assembly is the smallest unit to which you can associate a version number; it has the following format:


2.4.5 Deployment

Since a client application's assembly manifest (to be discussed shortly) contains information on external referencesincluding the assembly name and version the application usesyou no longer have to use the registry to store activation and marshaling hints as in COM. Using the version and security information recorded in your application's manifest, the CLR will load the correct shared assembly for you. The CLR does lazy loading of external assemblies and will retrieve them on demand when you use their types. Because of this, you can create downloadable applications that are small, with many small external assemblies. When a particular external assembly is needed, the runtime downloads it automatically without involving registration or computer restarts.

2.4.6 Security

The concept of a user identity is common in all development and operating platforms, but the concept of a code identity, in which even a piece of code has an identity, is new to the commercial software industry. In .NET, an assembly itself has a code identity, which includes information such as the assembly's shared name, version number, culture, public key, and where the code came from (local, intranet, or Internet). This information is also referred to as the assembly's evidence, and it helps to identify and grant permissions to code, particularly mobile code.

To coincide with the concept of a code identity, the CLR supports the concept of code access. Whether code can access resources or use other code is entirely dependent on security policy, which is a set of rules that an administrator configures and the CLR enforces. The CLR inspects the assembly's evidence and uses security policy to grant the target assembly a set of permissions to be examined during its execution. The CLR checks these permissions and determines whether the assembly has access to resources or to other code. When you create an assembly, you can declaratively specify a set of permissions that the client application must have in order to use your assembly. At runtime, if the client application has code access to your assembly, it can make calls to your assembly's objects; otherwise, a security exception will ensue. You can also imperatively demand that all code on the call stack has the appropriate permissions to access a particular resource.

2.4.7 Side-by-Side Execution

We have said that an assembly is a unit of versioning and deployment, and we've talked briefly about DLL Hell, something that .NET intends to minimize. The CLR allows any versions of the same, shared DLL (shared assembly) to execute at the same time, on the same system, and even in the same process. This concept is known as side-by-side execution. Microsoft .NET accomplishes side-by-side execution by using the versioning and deployment features that are innate to all shared assemblies. This concept allows you to install any versions of the same, shared assembly on the same machine, without versioning conflicts or DLL Hell. The only caveat is that your assemblies must be public or shared assemblies, meaning that you must register them against the GAC using a tool such as the .NET Global Assembly Cache Utility (gacutil.exe). Once you have registered different versions of the same shared assembly into the GAC, the human-readable name of the assembly no longer matterswhat's important is the information provided by .NET's versioning and deployment features.

Recall that when you build an application that uses a particular shared assembly, the shared assembly's version information is attached to your application's manifest. In addition, an 8-byte hash of the shared assembly's public key is also attached to your application's manifest. Using these two pieces of information, the CLR can find the exact shared assembly that your application uses, and it will even verify that your 8-byte hash is indeed equivalent to that of the shared assembly. Given that the CLR can identify and load the exact assembly, the end of DLL Hell is in sight.

2.4.8 Sharing and Reuse

When you want to share your assembly with the rest of the world, your assembly must have a shared or strong name, and you must register it in the GAC. Likewise, if you want to use or extend a particular class that is hosted by a particular shared assembly, you don't just import that specific class, but you import the whole assembly into your application. Therefore, the whole assembly is a unit of sharing.

Assemblies turn out to be an extremely important feature in .NET because they are an essential part of the runtime. An assembly encapsulates all types that are defined within the assembly. For example, although two different assemblies, Personal and Company, can define and expose the same type, Car, Car by itself has no meaning unless you qualify it as [Personal]Car or [Company]Car. Given this, all types are scoped to their containing assembly, and for this reason, the CLR cannot make use of a specific type unless the CLR knows the type's assembly. In fact, if you don't have an assembly manifest, which describes the assembly, the CLR will not execute your program.

2.4.9 Manifests: Assembly Metadata

An assembly manifest is metadata that describes everything about the assembly, including its identity, a list of files belonging to the assembly, references to external assemblies, exported types, exported resources, and permission requests. In short, it describes all the details that are required for component plug-and-play. Since an assembly contains all these details, there's no need for storing this type of information in the registry, as in the COM world.

In COM, when you use a particular COM class, you give the COM library a class identifier. The COM library looks up in the registry to find the COM component that exposes that class, loads the component, tells the component to give it an instance of that class, and returns a reference to this instance. In .NET, instead of looking into the registry, the CLR peers right into the assembly manifest, determines which external assembly is needed, loads the exact assembly that's required by your application, and creates an instance of the target class.

Let's examine the manifest for the hello.exe application that we built earlier. Recall that we used the ildasm.exe tool to pick up this information.

.assembly extern mscorlib
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
  .ver 1:0:5000:0

.assembly hello
  .hash algorithm 0x00008004
  .ver 0:0:0:0
.module hello.exe
// MVID: {F828835E-3705-4238-BCD7-637ACDD33B78}

You'll notice that this manifest starts off identifying an external or referenced assembly, with mscorlib as the assembly name, which this particular application references. The keywords .assembly extern tell the CLR that this application doesn't implement mscorlib, but makes use of it instead. This external assembly is one that all .NET applications will use, so you will see this external assembly defined in the manifest of all assemblies. You'll notice that, inside this assembly definition, the compiler has inserted a special value called the publickeytoken, which is basic information about the publisher of mscorlib. The compiler generates the value for .publickeytoken by hashing the public key associated with the mscorlib assembly. Another thing to note in the mscorlib block is the version number of mscorlib.[7]

[7] The fascinating details are explained in Partition II Metadata.doc and Partition III CIL.doc, which come with the .NET SDK. If you really want to understand metadata IL, read these documents.

Now that we've covered the first .assembly block, let's examine the second, which describes this particular assembly. You can tell that this is a manifest block that describes our application's assembly because there's no extern keyword. The identity of this assembly is made up of a readable assembly name, hello, its version information, 0:0:0:0, and an optional culture, which is missing. Within this block, the first line indicates the hash algorithm that is used to hash selected contents of this assembly, the result of which will be encrypted using the private key. However, since we are not sharing this simple assembly, there's no encryption and there's no .publickey value.

The last thing to discuss is .module, which simply identifies the output filename of this assembly, hello.exe. You'll notice that a module is associated with a GUID, which means you get a different GUID each time you build the module. Given this, a rudimentary test for exact module equivalence is to compare the GUIDs of two modules.

Because this example is so simple, that's all we get for our manifest. In a more complicated assembly, you can get all this, including much more in-depth detail about the make up of your assembly.

2.4.10 Creating Assemblies

An assembly can be a single-module assembly or a multi-module assembly. In a single-module assembly, everything in a build is clumped into one EXE or DLL, an example of which is the hello.exe application that we developed earlier. This is easy to create because a compiler takes care of creating the single-module assembly for you.

If you wanted to create a multi-module assembly, one that contains many modules and resource files, you have a few choices. One option is to use the Assembly Linker (al.exe) that is provided by the .NET SDK. This tool takes one or more IL or resource files and spits out a file with an assembly manifest.

2.4.11 Using Assemblies

To use an assembly, first import the assembly into your code, the syntax of which is dependent upon the language that you use. For example, this is how we import an assembly in C#, as we have seen previously in the chapter:

using System;

When you build your assembly, you must tell the compiler that you are referencing an external assembly. Again, how you do this is different depending on the compiler that you use. If you use the C# compiler, here's how it's done:

csc /r:mscorlib.dll hello.cs

Earlier, we showed you how to compile hello.cs without the /r: option, but both techniques are equivalent. The reference to mscorlib.dll is inherently assumed because it contains all the base framework classes.