DotNetPELib 3.0 Documentation

DotNETPELib is an unmanaged library written in C++, which allows generation of .NET assemblies from C++ programs.   It has full support for creating namespaces, classes, fields, methods, method bodies, and some advanced features such as support for explicit classes and properties.   Full support for PInvoke is available for calling unmanaged DLL entry points.   This library uses the ECMA-335 standard as a reference for the implementation, but also supports later versions of .NET assemblies.

DotNETPELib also natively supports the 'argument array' and 'enumeration' features of C#.  

Code generated by DotNETPELib may be accessed from other .NET assemblies, and there is also full support for importing other assemblies in order to access their fields and methods.   In the simplest case, one can import an assembly then search for declarations of interest using various search functions.   The declaration will then be used later by the library when generating references needed by your program.   In more advanced case, one can iterate through all the definitions in an assembly and put them in a separate symbol table.   For example the occil compiler copies all the static functions in referenced assemblies into its own symbol table then allows one to access them in C code using standard C++ semantics.

DotNetPELib 3.0 also allows signing an assembly, if one has a strong name key file describing the signing keys.

As an output format, DotNETPELib currently supports .NET assemblies in both EXE and DLL format.   It will both read and write them.   It will also generate a .IL formatted file which can be further compiled with the standard .NET ILASM program to generate an assembly.

There is also support for a simple object file format, and the library comes with a linker called 'netlink' which will read object files and create an output file, similar to how module-based languages such as C link object files into an executable.

A reference implementation of a C compiler called 'occil' makes use of this library to generate managed assemblies.

This documentation will consider the available APIs in DotNetPELib 3.0.

Overview


The DotNetPELib api is wrapped in the C++ namespace DotNetPELib for isolation from other libraries.  

The main header file for DotNetPELib is "DotNetPELib.h"

DotNetPELib will manage memory if that is desirable; most of the classes described in this documentation have their constructors wrapped by an object of class Allocator.   When this object is destroyed, it will call the destructor for each allocated item then release its memory.  In this way the user is freed from keeping track of every created object.

An object of type PELibError may be thrown during validation of MSIL code.

The main API is an object derived from the PELib class.  PELib inherits from Allocator, which exposes all the constructors for the other elements of the AI.   Between that and the various utility functions it exposes, PELib is usually the main entry point for creating things with the API or probing existing values.

AssemblyDef objects are the high-level objects that hold the data for each assembly.   An AssemblyDef can be either internally generated, or loaded from an external source.   With DotNetPELib there will be one 'public' AssemblyDef object that describes the assembly being generated, and one or more external AssemblyDef objects which describe other assemblies.   It is possible to load an external assembly into an AssemblyDef object, or one can explicitly write the data for an external assembly in through code.   For example 'mscorlib' could be loaded and would be considered an external assembly.

To be compatible with C# an AssemblyDef object would usually hold one or more Namespace objects, however, for applications that don't need to be compatible it is possible to just start putting fields and methods into the main AssemblyDef.   One cannot put Properties in an AssemblyDef though

A Namespace object will normally hold one or more Class or Enum objects.   A Class object can hold other Class and Enum objects, and it can also hold various other types of endpoint objects such as a Method, a Field, or a Property.   An Enum object just holds Field objects that describe the enumerated values.

The AssemblyDef, Class, Enum, and Namespace classes all inherit from a base class DataContainer which holds functionality which is common between all those classes.   A related class Qualifier holds qualifier flags for various containers and other objects, such as whether the container defines an object or a value type, whether an object is static, etc..

The Field object holds a Type object describing the field type, and possibly initialization data.   Fields are also used when describing enumerated values.

The Method object holds a MethodSignature which describes the way the method looks to other code, and it also holds a list of Instruction objects which describe the runtime behavior of the method.  

A base class CodeContainer actually holds most of the functionality related to MSIL instructions.  The MSIL instruction capability is somewhat advanced.  It optimizes which instructions get used in various cases where shorter instructions can be chosen, and checks stack balancing as a sanity check on the generated code.   It also minimizes the size of the locals area.   Live variable analysis is also performed, as an aid to the stack checking (dead regions might be unbalanced).

The Instruction objects uniquely define MSIL instructions.   An Instruction object can hold an Operand object, which can hold a native object such as a number, string or label, or a reference to a variable, type, or method signature.

A special type of instruction object is used to create boundaries for regions to be considered for SEH. A try block can be defined with a beginning and ending, and immediately following that would be a catch block. Finally, Fault, and Filter blocks are also supported.

Many Operand objects hold an instance of something derived from the Value class.    The base Value object usually gets rendered as a type (e.g. a class instance) but the derivations get rendered different ways.    For example a Local object describes a local variable, a Param object describes a parameter, a FieldName object references a Field object, and a MethodName object references a MethodSignature object.
 
The MethodSignature object holds a Type object for the return type, a list of Param objects for the main parameter list, and optionally a second list of Param objects.   This second list is only there to support unmanaged functions which utilize C-style variable length argument lists.

An instance of an auxilliary class CustomAttributeContainer holds custom attributes read in from an assembly.   In the current library implementation one can't add custom attributes to the generated code, with the exception that the library will automatically generate the custom attribute required for the parameter array, e.g. the C# version of variable length argument lists.

A special object BoxedType is used as an aid for boxing; it effectively transforms basic types into their boxed version.

There are also two internal APIs used by the library; one is used for generating the binary version of .NET assemblies, and the other is used to load the binary version of .NET assemblies into internal memory.   These APIs will not normally be directly used when utilizing the library to generate .NET assemblies, and are beyond the scope of this documentation.   These APIs are described in the file PEFILE.h for those who would like to consider the implementation.