Thursday, July 5, 2007

Application Development: What's new in ADO.NET version 2.0

With the first, public alpha version of the coming release of Visual Studio .NET—christened "Whidbey"—now in the hands of developers, it's time to start thinking about your applications and how they might be affected as you move to this new version. Although the move from version 1.0 to 1.1 of the .NET Framework was trivial and involved mostly bug fixes, performance enhancements, and the integration of previously separate technologies such as the ODBC and Oracle .NET Data Providers, version 2.0 changes the story for data access. It includes a host of new features, some of which may cause you to rethink how data is accessed and manipulated in your applications.

In this article, I'll give you a brief rundown of what I see as the most significant new features in ADO.NET and how you might take advantage of them in your implementations.

Providing a wider view of data
Before delving into the specific features of ADO.NET v2.0, let me preface the discussion by noting that one of the overall design goals of this version was to allow a higher degree of interoperability between data accessed relationally, accessed as XML, and accessed as custom objects. Since these three make up the "ruling triumvirate" in representing data in an application, ADO.NET v2.0 was designed to make it easier for developers to use the appropriate model when desired within and across applications.

For example, in applications built using a service-oriented architecture (SOA), persistent business data will often be manipulated relationally; data that represents the process and encapsulates business rules will be manipulated as objects; and message and lookup data that must be serialized for transport will be handled as XML. To present the new features, I've factored them into two broad buckets: the new features that provide this wider view of data and the features that enhance or extend the relational paradigm.

Widening the .NET
There are two primary new features you'll want to explore in the area of extending your ability to handle data. Let's take a look at each.

ObjectSpaces
This technology was previewed several years ago at PDC and will now be released in Whidbey. Simply put, ObjectSpaces provides an object-relational mapping layer in the System.Data.ObjectSpaces namespace, which instantiates and populates custom objects from a relational database. This works through XML metadata stored in a mappings file that is passed to the constructor of the ObjectSpace class, which maps relational objects to .NET objects and relational types to .NET types.

The programming model supports queries (ObjectQuery) and maintains sets of objects in memory (ObjectSet), access to streams of objects (ObjectReader), and even lazy loading of objects to improve performance (ObjectList and ObjectHolder). Following is an example of how the programming model looks:

// Create the mappings
ObjectSpace oa = new ObjectSpace(mappings-file, connection);
// Query the data
ObjectQuery oq = new ObjectQuery(Product, "category='Equipment'");
ObjectReader or = oa.GetObjectReader(oq);
// Traverse the data
while (or.Read())
{
   Product p = (Product)or.Current;
   Console.WriteLine(p.Name);
}

Although in the current release, ObjectSpaces works only with SQL Server 2000 and SQL Server "Yukon" (the release of SQL Server more or less synchronized with the release of Whidbey), this technology will be extended to access other relational stores in the future. ObjectSpaces is ideal when you want to represent your data using a domain model and encapsulate business logic as methods on your custom objects, since it will save you from writing the tiresome code needed to load from and persist your objects to a relational data store.

SQLXML and XmlAdapter
Although the ADO.NET DataSet has always included the ability to load data as XML and serialize its contents as XML, the translation between the two ways of representing data always included some tension. For example, in order for the XML to load into a DataSet, its schema couldn't be overly complex, and it needed to map well into the relational DataTables of the DataSet.

Although DataSet support of XML has been enhanced in version 2 to allow loading of XML with multiple in-line schemas, loading schemas with repeated element names in different namespaces, and loading/serializing directly from DataTable objects, the data must still be relational in nature to work with the DataSet. To overcome this, version 2 includes the System.Xml.XmlAdapter class. This class is analogous to the DataAdapter classes in that it is a liaison between a data source and a representation of the data, but is used to query and load XML from an XML View into an XPathDocument object (called XPathDocument2 in the alpha; however, that will be renamed to XPathDocument before release).

XML Views allow relational tables (in SQL Server only) to be mapped to an XML schema via mappings files; they are the core component of the SQLXML 3.0 technology, once provided separately from the .NET Framework but now migrated into ADO.NET v2 (including the ability to bulk-load XML into SQL Server) in the System.Data.SqlXml namespace. Using this approach, you can provide a set of XML Views for your SQL Server data, query the data with the XQuery language using the new XQueryProcessor class and the Fill method of the XmlAdapter, and manipulate the data using the XPathDocument, XPathEditor, XPathNavigator, and XPathChangeNavigator classes.

The changes are written to SQL Server by calling the Update method of the XmlAdapter, which relies on the XML View to write the SQL statements to execute using a mappings file. The advantage of this approach is that you can treat your SQL Server data no differently than other XML data stores and can take advantage of the full fidelity of the XML when making changes. See a simple example of the programming model.

// Set up the connection and query
SqlConnection con = new SqlConnection(connection-string);
XQueryProcessor xq = new XQueryProcessor();
xq.XmlViewSchemaDictionary.Add("name", new XmlTextReader("mappings-file"));
xq.Compile(…);
// Set up the datasource
XmlDataSourceResolver xd = new XmlDataSourceResolver();
xd.Add("MyDB", con);
// Configure the XmlAdapter
XmlAdapter xa = new XmlAdapter(xd);
XPathDocument xp = new XPathDocument();
// Execute the query and populate the document
xa.Fill(xp, xq.XmlCommand);
// Navigate the document…
XPathNavigator xn = xp.CreateXPathNavigator();
// Or edit the document and change the data
XPathEditor xe = xp.CreateXPathEditor();
// Set the schema and update the database
MappingSchema ms = new MappingSchema("mappings-file");
xa.Update(xp, ms);

Of course, XML Views simply provide the mapping of the data to and from SQL Server. If you're not using SQL Server, you can still take advantage of the substantial changes to XPathDocument (that will supersede and make obsolete the XmlDocument class) and its related classes to more easily query, navigate, and edit XML that you load from other sources.

For example, you can use a new XmlFactory class to create a related set of XmlReader, XmlWriter, and XPathNavigator classes for an XML document. These classes now support the ability to read and write .NET types to and from XML documents. And, of course, performance has improved for reading and writing with XmlTextReader and XmlTextWriter, and when using XSLT.

Extending the relational paradigm
The second broad set of changes relates to those made in ADO.NET v2.0 to enhance relational database access. I've organized these into changes that all developers can take advantage of, regardless of the underlying database you write to and regardless of those that will require SQL Server 2000 or the next version of SQL Server, Yukon.

Provider factory
Although the design of .NET Data Providers is based on a common set of interfaces and base classes, in v1.0 or v1.1, Microsoft did not ship factory classes to help developers write polymorphic data access code. As a result, developers did so on their own.

In version 2, ADO.NET includes factory classes inherited from System.Data.Common.DbProviderFactory to create the standard connection, command, data reader, table, parameter, permissions, and data adapter classes; these help you write code that targets multiple databases. A factory is accessed using the GetFactory method of the DbProviderFactories class and can be configured in the application's configuration file using the DbProviderConfigurationHandler.

Asynchronous data access
Commands executed by ADO.NET in version 1.0 using the ExecuteNonQuery, ExecuteReader, and ExecuteXmlReader methods of SqlCommand were synchronous and would block the current thread until the results were returned by the server. In v2.0, each of these methods includes both Begin and End versions to support asynchronous execution from the client's perspective.

This technique employs the familiar asynchronous programming model using the AsyncCallback delegate in .NET, and so includes the SqlAsyncResult class to implement the IAsyncResult interface. While this feature works only for SqlClient at the moment, look for it to perhaps be extended to other providers before the release. Following is an example of setting up an asynchronous command. (Note that the SqlAsyncResult class is not included in the alpha at this time, so the code will not execute.)

// Set up the connection and command
SqlConnection con = new SqlConnection(connection-string);
SqlCommand cm = new SqlCommand(SQL statement, con);
cm.Open();
cm.BeginExecuteNonQuery(new AsyncCallback(DoneExecuting), null);
// Thread is free, do other things
// Callback method
private void DoneExecuting(SqlAsyncResult ar)
{
   int numRows = ar.EndExecuteNonQuery(ar);
   // print the number of rows affected
}

Batch updates
In version 1.0, a DataAdapter always sent changes to rows one at a time to the server. In version 2.0, the DataAdapter exposes an UpdateBatchSize property that, if supported by the data provider, allows changed rows to be sent to the server in groups. This cuts down on the number of round-trips to the server and therefore increases performance.

Data paging
In both SqlClient and OracleClient, the command object now exposes an ExecutePageReader method that allows you to pass in the starting row and the number of rows to return from the server. This allows for more efficient data access by retrieving only the rows you need to display. However, this feature reads the rows currently in the table, so subsequent calls may contain rows from the previous page because of inserts, or from the latter pages because of deletes. It therefore works best with relatively static data.

Binary DataSet remoting
Version 2.0 now allows DataSets to be serialized using a binary format when employing .NET remoting. This both increases the performance of remoting data between .NET applications and reduces the number of bytes transferred.

DataSet and DataReader transfer
In version 1.1, you could only load a DataSet from a DataAdapter. But in version 2.0, you can also load one directly using a DataReader and the Load method. Conversely, you can now generate a DataTableReader (inherited from DbDataReader) with the GetDataReader method in order to traverse the contents of a DataSet. This feature makes it easy to load a DataSet and view its data.

Climbing Yukon
In this category are the new features of ADO.NET v2.0 that relate directly to the new release of the SQL Server code named Yukon, due out in the same time frame:

MARS
Multiple active result sets (MARS) allows you to work with more than one concurrent result set on a single connection to Yukon. This can be efficient if you need to open a SqlDataReader and, during the traversal, execute a command against a particular row. MARS allows both commands to share the same SqlConnection object so that a second connection to SQL Server is not required.

Change notification
One of the most interesting new features of Yukon is its ability to support notifications. ADO.NET v2.0 includes programmatic support for this feature by including a SqlNotificationRequest object that can be bound to a SqlCommand.

When data returned from the command changes in the database, a message is sent to the specified notification queue. ADO.NET code can then query the queue either by using an asynchronous query that blocks until a message is sent or by periodically checking the queue using new Transact-SQL syntax.

To make this feature even easier to work with, a SqlDependency class that sets up an asynchronous delegate is included. This will be called when the data changes, and it can be used like other dependencies in conjunction with the ASP.NET caching engine. An example of using a SqlDependency object-

// Set up the connection and command
SqlConnection con = new SqlConnection(connection-string);
SqlCommand cm = new SqlCommand(SELECT statement, con);
SqlDependency dep = new SqlDependency(cm);
dep.OnChanged += new OnChangedEventHandler (HandleChange);
SqlDataReader dr = cm.ExecuteReader();
// Process the data
private void HandleChange (object sender, SqlNotificationEventArgs e)
{
  // A change has been made to the data
  // Inspect the type of change using e.Type
}

Yukon types
ADO.NET v2.0 supports the full set of Yukon data types, including XML and User Defined Types (UDTs). This means that columns in Yukon defined as XML can be retrieved as XmlReader objects, and that UDTs can be passed to stored procedures and returned from queries as standard .NET types. This allows your applications to work with data as fully formed objects while interacting with the database using the objects. This feature can be used profitably when writing managed code that runs in-process in SQL Server, allowing both the managed stored procedure and the client code to use the same .NET type.

Server-side cursors
Because it often caused applications to perform poorly, ADO.NET v1.0 and v1.1 did away with the server-side cursors for ADO v2.x. ADO.NET v2.0 now reintroduces the concept in Yukon using the ExecuteResultset and ExecuteRow methods of the SqlCommand object and the SqlResultset class.

The SqlResultset class offers a fully scrollable and updateable cursor that can be useful for applications that need to traverse a large amount of data and update only a few rows. Although this feature can be used from client applications such as ASP.NET, it is mainly intended for use when writing managed code that runs in-process with Yukon in the form of stored procedures.

Bulk copy
Although not restricted to Yukon, ADO.NET v2.0 now allows programmatic access to the BCP or bulk copy API exposed by SQL Server. This is done using the SqlBulkCopyOperation and SqlBulkCopyColumnAssociator classes in the System.Data.SqlClient namespace.

No comments: