Inside .NET Data Providers

Inside .NET Data Providers

When compared with OLE DB providers, .NET data providers have a lot to offer. Their simplified data access architecture often results in improved performance without the loss of functionality. Furthermore, .NET managed data providers directly expose provider-specific behavior to consumers by using methods and properties. They also involve a much smaller set of interfaces than OLE DB providers. Last but not least, .NET data providers work within the boundaries of the CLR and require no COM interaction. For example, the data provider for SQL Server 7 and SQL Server 2000 hooks up directly to the wire level, enabling it to gain a substantial performance advantage. This advantage is shown in Figure 10-7.

Figure 10-7
Comparing the SQL Server data provider and the OLE DB data provider.

The data provider for SQL Server does not have to pass through an intermediate module—that is, the OLE DB provider for SQL Server—and uses instead the RPC to issue commands to the database. When compared with its corresponding OLE DB provider, the same advantages apply to any native managed provider.

The Architecture of .NET Data Providers

The functionalities supplied by a .NET data provider fall into a couple of categories:

  • Support for the DataSet class through an implementation of the methods of the IDataAdapter interface

  • Support for connected data access, which includes classes representing connections, commands, and parameters

The simplest data provider interacts with callers only by using the DataSet object for reading and writing. The other category of support allows you to control connections and transactions, and to execute direct commands, regardless of whether these are based on the SQL language. Figure 10-8 shows the class hierarchy for the two standard managed data providers in .NET: the OLE DB provider and the SQL Server provider.

Figure 10-8
.NET data providers connect, execute commands, and retrieve data mostly by accessing a specific data source.

The objects that wrap connections, commands, and readers are provider-specific and can result in different sets of properties and methods for each data provider. Any internal implementation is database-aware. The only class not included in this schema is DataSet because it is common to all providers and works as a generic container for disconnected data.

Table 10-1 describes the interfaces of a .NET data provider.

Of all the interfaces listed in Table 10-1, only IDataAdapter must be present in every .NET data provider. If you don’t plan to implement an interface listed in the table or a method of an interface, expose the interface anyway but throw a NotSupportedException exception for each of its methods. Wherever possible, avoid a no-op implementation of methods and interfaces because doing so might result in data corruption, particularly when committing and rolling back transactions. In the rest of this chapter, I’ll review the main architectural features of a managed data provider and then offer you sample code for a simple provider that implements the same functionality as the DirectoryListing class. In doing so, I’ll also implement some update functions.

Make sure you assign your own data provider a unique namespace. In addition, prefix classes with a nickname that identifies the provider throughout any internal and client code. For example, use class names such as SqlConnection, SqlCommand, and SqlDataReader. In this example, the nickname is Sql. Try to use distinct files to compile individual functionalities.

Implementing a Connection

The provider connection class inherits from IDbConnection and must expose at least the ConnectionString, State, Database, and ConnectionTimeout properties. The following methods are mandatory: Open, Close, BeginTransaction, ChangeDatabase, and CreateCommand. You are not required to implement transactions. Likewise your data provider is not requested to support nested transactions, although IDbTransaction allows for that. The following code snippet gives you an idea of how to implement a connection:

namespace SampleDataProvider 
    public class MyConnection : IDbConnection
        private ConnectionState m_state;
        private String m_sConnString;

        public MyConnection () 
            m_state = ConnectionState.Closed;
            m_sConnString = "";

        public MyConnection (String connString) 
            m_state = ConnectionState.Closed;
            m_sConnString = connString;

        public IDbTransaction BeginTransaction() 
            throw new NotSupportedException();

        public IDbTransaction BeginTransaction(IsolationLevel level) 
            throw new NotSupportedException();

You should provide at least two constructors, one being the default, which takes no argument. The other recommended constructor accepts only the connection string. When returning the connection string by using the ConnectionString property, always return exactly what the user set unless security-sensitive information dictates otherwise. For example, you might want to remove a password.

The items you recognize and support in the connection string are up to you, but standard names should be used whenever it makes sense. The Open method is responsible for opening the physical channel of communication with the data source. This channel should not be opened before the Open method is called. Consider using some sort of connection pooling when opening a connection turns out to be an expensive operation. When the provider is expected to provide automatic enlistment in distributed transactions, the enlistment should occur during Open.

An important difference between ADO.NET connections and other connections such as ADO is that with ADO.NET, you are requested to guarantee that a connection is created and opened before any command is executed. Clients have to explicitly open and close connections, and no method will open and close connections implicitly for the client. This approach leads to a centralization of security checks. Checks are performed only when the connection is obtained, but the benefits apply to all other classes in the provider that work with connection objects.

You close the connection by using the Close method. In general, Close should simply detach the connection and return any objects to the pool. You can also implement a Dispose method to customize the destruction of the object.

The state of a connection is identified via the ConnectionState enumeration. While the client performs operations over the connection, you ensure that the internal state of the connection matches the contents of the State property. So, for instance, when you are retrieving data, set the connection’s State property to ConnectionState.Fetching.

Implementing a Command

The command object formulates a request for some actions and passes this request to the data source. If results are returned, the command object is responsible for packaging and returning results as a tailored data reader object, a scalar value, or via output parameters. According to the special features of your data provider, you can arrange for results to appear in other formats. For example, the data provider for SQL Server lets you obtain results in XML format when the command text includes the FOR XML clause.

The command class must support at least the CommandText property and the text command type. The provider parses and executes the command. This is the key aspect that enables a provider to accept any text or information as a command. Supporting command behaviors by using the CommandBehavior enumeration is not mandatory and, if needed, you can support more custom behaviors.

Within a command, the connection can be associated with a transaction. If you reset the connection (users should be able to change the connection at any time), first null out the corresponding transaction object. If you support transactions, when setting the Transaction property of the command object, consider additional steps to ensure that the transaction you’re using is already associated with the connection used by the command.

A command object works in conjunction with two classes representing parameters: xxxParameterCollection, which is accessed through the Parameters property; and xxxParameter, which represents a single command parameter stored in the collection. Of course, the xxx stands for the provider-specific nickname. For SQL Server, the two classes are SqlParameterCollection and SqlParameter.

You create a provider-specific command parameter by using the new operator on the parameter class or by using the CreateParameter method of the command object. Newly created parameters are populated and added to the command’s collection by using the methods of the Parameters collection. The module that allows command execution is then responsible for collecting data that is set via the parameters. You have the choice of using either named parameters (as the SQL Server provider does) or the ? placeholder (similar to the OLE DB provider).

You must have a valid and open connection to execute a command. You execute a command using any of the standard command types: ExecuteNonQuery, ExecuteReader, or ExecuteScalar. Consider providing an implementation for the Cancel and Prepare methods.

Implementing a Data Reader

As you learned in previous chapters, a data reader is a kind of connected buffer that the provider creates to let clients read data in a forward-only manner. The actual implementation of the reader is up to you, but let me offer a few important guidelines.

When the DataReader object is returned to the user, it should always be open and positioned prior to the first record. Users should not be able to directly create a DataReader object. Only the command object must create and return a reader. For this reason, mark the constructors as internal. Use the keyword internal in C#:

internal MyDataReader(Object resultset)


Use the keyword friend in Visual Basic .NET:

Friend Sub New(ByRef resultset As Object)
End Sub

The DataReader class must have at least two constructors: one that takes the result set of the query and one that takes the connection object used to carry out the command. The connection is necessary only when the command must execute with the CommandBehavior.CloseConnection style. In this case, the connection must be closed automatically when the DataReader object closes. The result set can take any form that serves your needs, for example, an array or a dictionary.

A DataReader object should properly manage the RecordsAffected property. This property is applicable only to batch statements that include inserts, updates, or deletes. It normally does not apply to query commands. When the reader is closed, you might want to disallow certain operations and change the reader’s internal state, cleaning up internal resources such as the array used to store data.

The DataReader object’s Read method always moves forward to a new valid row, if any exist. More important, it only advances the internal data pointer to the row but does not actually read the row. The actual reading takes place with other reader-specific methods such as GetString and GetValues. Finally, NextResult moves to the next result set. NextResult copies a new internal structure into a common repository where methods such as GetValues read from.

Implementing a Data Adapter

A full-fledged .NET data provider supplies a data adapter class that inherits both DbDataAdapter and IDataAdapter. The DbDataAdapter class implements a data adapter designed for use with a relational database, but use IDataAdapter when all you really need is a class that implements the IDataAdapter interface and copies some disconnected data to an in-memory programmable buffer such as the DataSet object. Implementing the Fill method of the IDataAdapter interface is in most cases sufficient to return disconnected data by using a DataSet object. Typical constructors for the DataAdapter object are the following:

XxxDataAdapter(SqlCommand selectCommand) 
XxxDataAdapter(String selectCommandText, String selectConnectionString) 
XxxDataAdapter(String selectCommandText, SqlConnection selectConnection)

Classes that inherit from DbDataAdapter must implement all the members and define additional members for provider-specific functionality. This ends up requiring the implementation of the following methods:

Fill(DataSet ds)
FillSchema(DataSet ds, SchemaType st)
Update(DataSet ds)

The following are required properties: TableMappings, which defaults to an empty collection; MissingSchemaAction, which defaults to Add; and MissingMappingAction, which defaults to Passthrough. You can provide as many implementations of the Fill method as you need.

Table mappings govern the way in which source tables (that is, database tables) are mapped to DataTable objects in the parent DataSet object. Mappings take into account table names as well as column names and properties. Schema mapping refers to the way in which columns and tables are treated when adding new data to existing DataSet objects. The default value for the missing mapping property tells the adapter to create in-memory tables that look like source tables. The default value for the missing schema property handles possible issues raised when the DataTable objects are actually populated. If any of the mapped elements (tables and columns) are missing in the target DataSet object, the value of MissingSchemaAction suggests what to do. Both MissingXXX properties are a kind of exception handler. The value Add forces the adapter to add any missing table or column. No critical information is added unless another AddWithKey value is assigned to the property.

When an application calls the Update method, the class examines the RowState property for each row in the DataSet object and executes the required INSERT, UPDATE, or DELETE statement. When the class does not provide UpdateCommand, InsertCommand, or DeleteCommand properties but does implement IDbDataAdapter, you can try to generate commands on the fly or raise an exception. You can also provide a made-to-measure command builder class to help with the command generation.

Simple vs. Complex Data Providers

Just as in OLE DB, in .NET you have simple and complex providers. The difference between the two is in the number of implemented interfaces. If you fully support all the interfaces listed in Table 10-1, your data provider is complex. A fully loaded provider is needed only in a small number of cases—for example, when you have a proprietary wire format or are a database vendor. In such a case, the performance advantages are substantial.

More frequently, your ultimate goal is making proprietary data publicly available, for both reading and writing functions, by using a common programming interface. In this case, you don’t need command objects and parameters, nor do you need to pool connections or manage transactions. All you do need is the ability to read from and write to your data, which is the .NET counterpart of simple OLE DB providers. You can create a class for this purpose, but your class will not necessarily fulfill the requirements of a common interface. That’s why simple .NET data providers are definitely worth a closer look.