Using Unsafe Code

Using Unsafe Code

The C# language doesn’t generally allow you to perform certain operations considered to be unsafe. For example, accessing memory locations directly and performing pointer arithmetic are considered to be unsafe operations. Rather than completely barring these operations, C# permits them, but it requires that such code be explicitly marked as unsafe.

The unsafe context prevents code verification. Therefore, any executable code that includes an unsafe context requires a higher level of trust. This restriction might prevent your code from being executed in some scenarios. This section will discuss the need for unsafe code and how to declare unsafe contexts.

Examining the Need for Unsafe Code

Do you need unsafe code? Unsafe code is rarely necessary, but here are a few cases when unsafe code is useful:

  • You’re using platform invoke (P/Invoke) to interoperate with existing COM or Microsoft Win32 code, and you have data structures with embedded pointers. In this case, using unsafe code might be the simplest and safest way to manage the legacy data structures.

  • Similarly, if you’re dealing with legacy data structures that are written directly to a disk or a socket stream, the simplest course of action might be to use unsafe code.

  • Occasionally, you can write more efficient code using direct memory access instead of using purely managed code.

When you’re working exclusively with the classes in the .NET Framework, there’s little need for unsafe code. The capability is included in the language for those rare situations in which you must manipulate memory directly, such as when you’re working with legacy code and data structures. For example, when you’re working with files that use an existing file format, you might need to read the contents of the file into a buffer and access the memory directly.

Declaring Unsafe Contexts

An unsafe code region is declared with the unsafe keyword. The unsafe region consists of code enclosed in curly braces that follows the unsafe keyword. Unsafe code is permitted within this code region, as shown here:

unsafe
{
    
    

The unsafe keyword can be used only when you’re compiling with the /unsafe compiler switch. If you don’t set the /unsafe switch, the compiler will generate an error and refuse to compile your code. This restriction is intended to make your decision to move to unsafe code an explicit one. The /unsafe switch is enabled on the project’s Build property page in the Configuration Properties folder, as shown in Figure 10-4.

Figure 10-4.
Enabling the /unsafe compiler switch on the project’s Build property page.

In addition to declaring code regions as unsafe, you can also declare types as unsafe. Declaring a type as unsafe enables you to use pointer declarations as fields or parameters. Any use of unsafe types in externally visible portions of your code makes it noncompliant with the Common Language Specification (CLS).

To declare a type as unsafe, simply add the unsafe keyword to the type declaration, as shown here:

unsafe public struct IntNode
{
    public IntNode* pNext;
    public int  x;
    
    

}

Declaring a class or a structure as unsafe enables the use of pointers as fields. In addition, pointers can be declared in method parameters, as follows:

unsafe public class Tree
{
    public void CopyNode(TreeNode* node)
    {
        
    

    }
    
    

Individual methods can be declared as unsafe, marking the method, but not the enclosing type, as unsafe:

public class Tree
{
    unsafe public void CopyNode(TreeNode* node)
    {
        
    

    }
    
    

An interface can be marked as unsafe, enabling you to use pointers in the interface declaration as shown here:

unsafe interface INode
{
    void CopyNode(TreeNode* node);
    
    

}
Operators Used in Unsafe Code

C# offers a number of operators that can be used only in an unsafe context, as described in Table 10-2. These operators enable you to directly access objects and memory using pointers, to determine the addresses of objects, and to use pointer arithmetic to calculate addresses. Although they can be useful, remember that these operations are potentially dangerous and can be used only in an unsafe context.

Reflecting the heritage of the C# language, the operators listed in Table 10-2 are shared with the C and C++ languages. The C# versions of these operators work almost exactly like their C and C++ counterparts. For example, the following code declares a pointer and assigns an address to it:

int n = 42;
int *pn = &n; 

Although you can determine the address of a value type object, you can never calculate the address of a managed object with the & operator. To dereference an address, the * operator is used, as shown here:

Console.WriteLine("{0}", *pn);
int k = *pn;

To access members of a structure via a pointer, use the -> operator, as shown here:

public void RemoveNode()
{
    pNext->pPrev = pPrev;
    
    
Locking Memory Blocks

Memory that’s allocated on the managed heap is subject to being relocated as part of the heap’s garbage collection process. Before manipulating blocks of managed memory, the memory locations must be locked using the fixed statement, as shown here:

MyType mt = new MyType();
fixed(int *pmt = &mt.x)
{
    
    

}

The fixed statement consists of two parts: an expression that can be converted to an unmanaged type, and a declaration of an unmanaged type variable that’s assigned the value of the expression. The address is locked until the statement or statement block that follows the fixed statement has finished executing.

The fixed statement can target any managed memory block, including instance or static class fields, but the variable that’s the target of the fixed statement must be a value type, an array, or a string. The following code shows how a single fixed statement can lock multiple memory blocks, as long as they are of the same type:

fixed(int *pmt = &mt.x, pmy = &mt.y)
{
    
    

}

Locking a memory block potentially degrades heap performance because it interferes with the heap compacting mechanism. Any blocks that are locked can’t be moved and must remain locked in place until they’re unlocked. When you lock a memory block, take care to lock it for as short a time as possible.

Allocating Temporary Memory

When running code in an unsafe context, you can allocate memory from the call stack. The stackalloc keyword allocates enough storage space from the stack to accommodate a specified number of objects of a specific type, as shown here:

int* pn = stackalloc int[42];

In the preceding code, the stackalloc keyword allocates an array of 42 integers. The memory allocated by stackalloc isn’t initialized, so you must explicitly initialize it to a default value, as follows:

int* pn = stackalloc int[42];
for(int n = 0; n < 42; n++)
{
    pn[n] = 0;
} 

The space allocated from the stack is allocated along with automatic variables for the current method. When the current method returns, the call stack unwinds and the memory allocated by stackalloc is reclaimed.

Using an Unsafe Code Block

The UnsafeSort project, included on the companion CD, provides an example of how you can use unsafe code blocks. UnsafeSort implements the QuickSort sorting algorithm using direct memory manipulation and pointer arithmetic. QuickSort is a recursive algorithm that progressively divides an array into smaller subarrays, refining the order as the array is subdivided. The basic algorithm works like this:

  1. Take the value of the array’s midpoint element.

  2. Starting at the lower bound of the array, look for the first element that has a value larger than the midpoint element. If the element is discovered prior to reaching the midpoint, this element is out of position.

  3. Starting at the upper end of the array, look for the first element that has a value smaller than the midpoint element. If the element is discovered prior to reaching the midpoint, this element is out of position.

  4. Exchange the elements found in steps 2 and 3.

  5. Repeat steps 2 through 4 until every element in the array is tested.

  6. Restart the algorithm twice, using the array below the midpoint for the first iteration and the array above the midpoint for the second iteration. Continue recursively until the array subsection that’s tested has two sorted elements.

The QuickSort algorithm is implemented in the UnsafeQuickSort method, which is part of the Sorter class, shown in the following code:

public unsafe void UnsafeQuickSort(int* pLower, int* pUpper)
{
    if(pLower < pUpper)
    {
        int* pOriginalLower = pLower;
        int* pOriginalUpper = pUpper;
        long elements = pUpper - pLower;
        int  midpointValue = *(pLower + (elements/2));
        while(pLower <= pUpper)
        {
            // Find the first unsorted element below the midpoint.
            while((pLower < pOriginalUpper)&&(*pLower < midpointValue))
                ++pLower;

            // Find the last unsorted element above the midpoint.
            while((pUpper > pOriginalLower) && (*pUpper > midpointValue))
                --pUpper;

            if(pLower < pUpper)
            {
                // Sneaky trick that quickly swaps two scalar 
                // values in place
                *pLower ^= *pUpper;
                *pUpper ^= *pLower;
                *pLower ^= *pUpper;
            }
            if(pLower <= pUpper)
            {
                ++pLower;
                --pUpper;
            }
        }
        // Recursively call method again to sort the remaining
        // unsorted portion of the array.
        if(pLower < pOriginalUpper)
            UnsafeQuickSort(pLower, pOriginalUpper);
        if(pUpper > pOriginalLower)
            UnsafeQuickSort(pOriginalLower, pUpper);
    }
}

As you can see, this method uses a lot of pointer arithmetic, beginning on the first line, where the addresses passed to the method are compared. The pLower and pUpper addresses mark the range of addresses in the array that are to be sorted. If these addresses are equal or pLower is greater than pUpper, no sorting is performed.

If the array range appears to be valid, the midpoint for the array is calculated and the value of the midpoint element is cached in midpointValue. Next the UnsafeQuickSort method looks for unsorted elements on the wrong side of the midpoint and exchanges them. After UnsafeQuickSort iterates over the entire array, the array is roughly sorted into the following subsections:

  • Elements smaller than the midpoint element are located below the midpoint.

  • Elements larger than the midpoint element are located above the midpoint.

To complete the sorting process, the array is split into two subarrays, which are recursively sorted. Because the pLower and pUpper pointers have either met or crossed, pLower is used as the lower bound of the subarray with higher values and pUppser is used as the upper bound of the subarray with lower values for the next sorting iteration.

The UnsafeQuickSort method is invoked by the Sorter.Sort method shown in the following code. The Sort method exposes a managed signature that’s callable by managed code that isn’t executing in an unsafe context.

public void Sort(int [] numberArray)
{
    unsafe
    {
        fixed(int* p = numberArray)
        {
            UnsafeQuickSort(p, &p[numberArray.Length-1]);
        }
    }
}

The Sort method establishes an unsafe code block, pins the array to prevent relocation by the garbage collector, and invokes the UnsafeQuickSort method, as shown here:

static void Main(string[] args)
{
    int [] ar = {3,7,2,1,6,8,5,4,9,12,6,3,14,7,2,2,13,10,11,1,1,1,11,2};
    Sorter s = new Sorter();
    s.Sort(ar);

    Console.WriteLine("Array contents:");
    foreach(int n in ar)
    {
        Console.WriteLine(n.ToString());
    }
    Console.WriteLine("Done");
}

This code creates an array of integers and an instance of the Sorter class. After the Sort method is called to sort the array, the array contents are written to the console.



Part III: Programming Windows Forms