When you compile source code into an assembly, the compiler interprets your C# or Visual Basic .NET statements, and creates a series of MSIL statements that will be executed by the .NET Framework. A decompiler is an application that analyses the MSIL statements in order to recreate the original Visual Basic .NET or C# statements written by the programmer.
Unfortunately, the .NET compilers contain human-readable information from our source code in the MSIL, including the names of types, methods, and fields. A decompiler can use this information to create source code that is very similar to the original. Some information, such as comments and blocks of code excluded by conditional compiler statements, are not included in assemblies by the compiler and cannot be restored by a decompiler.
The nature of MSIL makes it easier to decompile .NET assemblies than native Windows applications, which are compiled into instructions that are targeted at a specific CPU, such as an Intel Pentium. Lower-level instructions are more difficult to reconstruct into code statements than the relatively abstract MSIL statements. The proliferation and use of decompilers is more widespread than you might think. There are three main reasons why an assembly is decompiled:
The most benign reason for decompiling an assembly is simply to gain an understanding of how an application or library is written; the person who decompiles the assembly has no malicious intent, and simply seeks to improve her knowledge of .NET programming.
Your assemblies can be decompiled to reveal your business secrets, which can be used commercially by your competitors. In this context, business secrets may be proprietary algorithms, or the inner workings of your complete application.
Decompiling an assembly can provide details of how to subvert an application, exposing licensing details and allowing license codes to be generated that allow illegal copies of your application to be activated.
The scope for intellectual property theft through decompilation has been lessened by the increased use of thinner clients to connect to network services (this includes the move towards XML web services); there is less complexity to the client application, and the sophisticated logic is deployed within a remote network. In contrast, the prevalence of network services increases the scope for application subversion. Any network service that grants trust to clients based on data that is included in an assembly is subject to subversion through decompilation. Analysis of a client application can provide a wealth of information on network protocols and security configuration, which can be used to manipulate the network components of an application against the wishes and expectations of the developer.
Decompiling into Another LanguageMSIL is common between all .NET languages, meaning that an assembly written in Visual Basic .NET can be decompiled to produce C# statements, or statements in any other .NET-compliant language. There is no additional risk posed by this feature, but it does lower the barrier to understand your code; not only can the source code statements be reconstructed, but they can be reconstructed in a language which a potential attacker is familiar with. |
In this section, we demonstrate how much detail a decompiler exposes from an assembly. You will use the open source Anakrino/Exemplar decompiler to decompile the single-file assembly you created in the previous section; at the time of writing, the decompiler is available at http://www.saurik.com/net/exemplar/.
The decompiled versions of the SumNumbers and SumArray classes are belowthe decompiler we have selected generates only C# source code. We do not explain how to install or use the decompiler in this bookwe present the decompiled output so that you can understand what kind of information can be obtained from an assembly:
# C# using System; public class SumNumbers { private int o_total; public SumNumbers( ) { o_total = 0; } public void AddNumber(int p_number) { o_total += p_number; } public int GetTotal( ) { return o_total; } }
|
The efficacy of a decompiler is measured by the accuracy of the source code that it generatesthe better a decompiler is, the more the decompiled source code resembles the original statements. Our decompilation has produced a rendition of the SumNumbers class that is very close to the original; the names of the fields are preserved, and the structure and function of the class is clear:
# C# public class SumArray { public static int SumArrayOfIntegers(int[] p_arr) { SumNumbers sumNumbers = new SumNumbers( ); int[] nums = p_arr; for (int k = 0; k < (int)nums.Length; k++) { int j = nums[k]; sumNumbers.AddNumber(j); } return sumNumbers.GetTotal( ); } }
The decompiled version of the SumArray class is less like the original but still clearly demonstrates the implementation. Our simple assembly is easily decompiled, and the workings of our data types are clearly exposed; logic that is more complex can cause difficulties for decompilers, but in general, an unprotected assembly will yield its secrets easily.
If your assemblies contain no proprietary data, and no information that can be used to subvert your application, then you are in a position to distribute the assemblies freely; otherwise, you should consider protecting against decompilation with one of the techniques discussed below.
|
Obfuscation is the technique of altering the MSIL statements so that the application executes in the same way, but the output of a decompiler is unreadable. Obfuscation is such an important technique that Microsoft has included a copy of a limited functionality obfuscator in Visual Studio .NET 2003. Different obfuscators use different approaches to obscure decompiler output, but we summarize the more common types of obfuscation below:
Obfuscators rename the nonpublic methods and fields defined by your data types in a way that makes it difficult to read the decompiled output. A common technique is to use very long strings that differ by a single character or to use non-printing characters accepted by the .NET runtime, but which text editors do not display correctly.
Obfuscators make application logic more difficult to follow by creating complex sequences of method calls that do not do anything. This is more effective than it sounds, because it is hard to establish if these methods are related to the logic of the application. This technique is especially effective when combined with method and field renaming, creating especially complex decompiled output.
Encryption is applied to the literal strings defined within your assemblies; the purpose of this is to slow down searches that may reveal the purpose of sections of code; for example, searching for the word "license" may reveal which parts of your code deal with application license control.
Effective obfuscators combine these approaches and often apply proprietary techniques. There is a kind of "arms race" between the developers of obfuscators and the developers of decompilers, where each new feature added by an obfuscator is eventually compromised by a decompiler.
The biggest problem with obfuscation is that it alters the MSIL within your assembly; when problems arise, you will find that the obfuscation process can seriously hamper the debugging process. As a general guideline, do not obfuscate your assemblies unless you have to, and always select an obfuscator from a reputable company that will be able to support you if you encounter problems.
|
As you will see in Chapter 4, the .NET Framework runtime compiles your MSIL statements into native commands for the CPU before the code is executed. An alternative to obfuscation is to perform this compilation yourself and to create native instructions that cannot be processed by an MSIL decompiler.
Native compilation is a relatively new technique as applied to .NET assemblies, and the tools available at the time of writing are immature; the principal risk with native compilation is that the output can differ from that produced by the normal .NET compilation process, which can hamper the debugging process.