Sunday, January 7, 2007

Improve ADO.Net performance

Here are a few tips to improve ADO.Net performance

Poor performance is frustrating to an end user, and can lead to users not using your applications in the intended manner. Take advantage of these five tips to accelerate the performance

Although IT organizations invest significant resources in optimizing the network topology and database design of their applications, many IT organizations overlook the performance aspects of the database middleware. Developers overlook the impact that the ADO.NET provider has on the application, even though a significant percentage of the response time is related to the time spent requesting and receiving data from the database.

Performance and scalability problems can be debilitating to the overall success of an application and ultimately to the success of the development team. If an application suffers from poor response time, user productivity suffers, service-level agreements are violated, and the reputation of the development organization is maligned. For critical systems, application performance issues can be tied directly to business success in the form of increased cost, decreased revenue, and assumption of additional risk.

Some organizations deal with performance and scalability issues in a reactive fashion, because they lack the development and testing procedures necessary to optimize the application. They simply develop the application and deal with performance issues as they arise in the development environment.

A much better solution is to attempt to take the requisite steps to make sure your application performs to your user’s expectations, both from a functional and performance standpoint. However, there are quite a few steps you can take on your own to make sure your application processes task in the most efficient manner before production deployment.

One of the key reasons for user performance complaints in database apps is that developing .NET data access code that performs fast isn’t easy. The ADO.NET documentation includes only basic guidelines and interface definitions to help programmers develop apps using ADO.NET, and it provide next to nothing in terms of prescriptive guidance to developers who want to write code that performs well. There is very little guidance for writing code that performs well.

That said, you’re not entirely on your own. You can take quite a few steps that will result in code that performs faster and more reliably. I’ll walk you through several of the common performance pitfalls that I see made on a regular basis, as well as how to avoid them.

Tips: 1

Fast to Code != Fast Code

Many programmers use the DbCommandBuilder object because it can save time when coding a new application that uses DataSets. However, this shortcut can have a negative effect on performance. Built-in concurrency restrictions can lead to the DbCommandBuilder generating highly inefficient SQL statements. For example, suppose you have an eight-column table called EMP that contains employee records. The DbCommandBuilder object generates this UPDATE statement:

CommandText: "UPDATE EMP SET EMPNO = ?, ENAME = ?, JOB = ?, MGR = ?, HIREDATE = ?, SAL = ?, COMM = ?, DEPT = ? WHERE ( (EMPNO = ?) AND (ENAME = ?) AND (JOB = ?) AND ((MGR IS NULL AND ? IS NULL) OR (MGR = ?)) AND (HIREDATE = ?) AND (SAL = ?) AND ((COMM IS NULL AND ? IS NULL) OR (COMM = ?)) AND (DEPT = ?) )"

You can write much more efficient UPDATE and DELETE statements than the ones the DbCommandBuilder generates. For example, assume you’re working with the previous example, and you know the underlying database schema. Also, assume that you know the EMPNO column of the EMP table is the primary key for the table. You can create a much simpler UPDATE statement that retrieves the same results:

UPDATE EMP SET EMPNO = ?, ENAME = ?, JOB = ?, MGR = ?, HIREDATE = ?, SAL = ?, COMM = ?, DEPT = ? WHERE EMPNO = ?

This statement runs much more efficiently on the database server than the statement the DbCommandBuilder generated.

Another drawback of the DbCommandBuilder object—it generates statements at runtime. Each time a DataAdapter.Update method is called, the DbCommandBuilder analyzes the contents of the result set and generates UPDATE, INSERT, and DELETE statements for the DataAdapter. The programmer can avoid this extra processing time by specifying the UPDATE, INSERT, and DELETE statements for the DataAdapter explicitly.

Tips: 2
Avoid retrieving long data if you don’t need it. Retrieving long data across a network is slow and resource-intensive. Remember that when you use a DataSet, all data is retrieved from the data source, even if you never use it. However, some applications don’t formulate the select list before sending the query to the .NET data provider. In other words, some applications use this syntax to accomplish sending the query:

send SELECT * from ..
If the select list contains long data, most .NET data providers must retrieve that data at fetch time, even if the application never binds the long data result columns to display to the user. You should try to implement a method that limits the number of columns you retrieve whenever possible.

Users Don’t Want Long Data

It also helps to remember that most users don’t want to see long data. If the user does want to process these result items, the application can query the database again, specifying only the long columns in the select list. This method allows the average user to retrieve the result set without paying a high performance penalty for network traffic. Consider this query:

SELECT * from EMPLOYEES WHERE SSID = '999-99-2222'

An application might want to retrieve only this employee’s name and address. Unfortunately, a .NET data provider doesn’t know which result columns an application wants to retrieve when the query is executed. A data provider knows only that an application can request any of the result columns. When the .NET data provider processes the fetch request, it will most likely return one or more result rows across the network from the database server. In this case, a result row contains all the column values for each row, including an employee photograph if the Employees table contains such a column. Limiting the select list to contain only the columns you need results in decreased network traffic and a faster performing query at runtime.

Tips: 3
Another common performance pitfall concerns how you handle commits. Committing transactions is slow because of disk I/O and, potentially, network I/O. Always start a transaction after connecting; otherwise, you remain in Autocommit mode.

A commit involves several actions. The database server must flush back to disk every data page that contains updated or new data. This is usually a sequential write to a journal file. By default, Autocommit is on when connecting to a data source, and Autocommit mode usually impairs performance because of the amount of disk I/O needed to commit every operation.

Also, some database servers do not provide an Autocommit mode natively. For this type of server, the .NET data provider must issue a COMMIT statement explicitly and a BEGIN TRANSACTION for every operation sent to the server. You also pay a performance penalty for up to three network requests for every statement issued by an application—in addition to the large amount of disk I/O required to support Autocommit mode.

Consider this code fragment that starts an Oracle transaction:

DbProviderFactoryf = DbProviderFactories.GetFactory( "DDTek.Oracle");
try {
conn = f.CreateConnection();
conn.ConnectionString = ("Connection String info");
conn.Open();
DbTransaction transId = conn.BeginTransaction();
DbCommand cmd = conn.CreateCommand();
cmd.CommandText = "select * from users";
DbTransaction transId = conn.BeginTransaction();
cmd.Transaction = transId;
DbDataReader reader = cmd.ExecuteReader();
// Continue to work with transaction boundary.
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}

This approach can make an enormous difference. I once had a customer who performed 5,000,000 inserts to DB2. He heard me give this tip during a talk and made the change to his application. The insert went from taking five hours to taking ten minutes!

Using transactions can help application performance tremendously, but don’t take this technique too far. Leaving transactions active can reduce throughput by holding locks on rows for long times, preventing other users from accessing the rows. You should commit transactions in intervals that allow maximum concurrency.

Tips: 4

Use DbCommand.Prepare() Appropriately

Using the DbCommand.Prepare method can have a significant positive (or negative) effect on query execution performance. The DbCommand.Prepare method tells the underlying data provider to optimize for multiple executions of statements that use parameter markers. Note that you can Prepare any command regardless of the execution method used (ExecuteReader, ExecuteNonQuery, or ExecuteScalar) .

Consider a .NET data provider that implements DbCommand.Prepare by creating a stored procedure on the server that contains the prepared statement. Creating stored procedures involves substantial overhead, but you can execute the statement multiple times. Doing so minimizes the cost of executing that statement because the query is parsed and optimization paths are stored at create procedure time. Applications that execute the same statement multiples times can benefit greatly from calling DbCommand.Prepare and then executing that Command as needed.

However, using DbCommand.Prepare for a statement that is executed only once results in unnecessary overhead. Furthermore, applications that use DbCommand.Prepare for large, single-execution query batches exhibit poor performance. Similarly, applications that either always use DbCommand.Prepare, or never use DbCommand.Prepare, do not perform as well as those that use a logical combination of prepared and unprepared statements.

Tips: 5
Much has been written about when to choose DataReaders over DataSets. However, it is a critical choice when it comes to performance, so I will add to the significant amount of advice that is already there.

Let me sum it up as bluntly as I can: The DataReader will always be faster at fetching data. Period.

The DataSet uses a DataAdapter to retrieve the data from the database. The DataAdapter uses the DbDataReader when it is reading data. Given this, you might wonder why the DataReader is always faster. The answer lies in the fact that DataSet performs additional processing once the data is fetched from the DataReader, converting the data to the internal format of the DataSet and storing it in memory.

Once it has the data in memory, the DataSet maintains both the original and any changed data, leading to even higher memory usage. This can also lead to a scalability problem, depending on the size and number of copies that you keep around.

All that said, the DataSet is much more functional. It allows for random fetching (the DataReader is forward-only), it gives XML capabilities to relational data, it is disconnected from the database so it does not use up resources on the server, and it is updateable. My bottom line recommendation for performance is to use a DataReader when you don’t need the additional functionality of the DataSet.

Each one of the tips I’ve mentioned can improve the performance of your applications. Taken as a whole, these tips can have a considerable impact, not just on raw application speed, but in user satisfaction and in the ability of your company to meet its business goals.

Friday, January 5, 2007

DBA recommendations for writing Stored Procedure

1. Try to use derived tables when possible, instead of temporary tables to gain better performance.
ex:
SELECT MIN(Salary)
FROM (SELECT TOP 5 Salary FROM Employees ORDER BY Salary Desc)

2. Do not use 'SELECT * from table'. Only return the columns you want/need. This brings more data back over the network and also eliminates the use of some indexes (covering indexes). This may also screw up apps that are expecting a certain number of columns when a column is added.

3. Use the @@error global variable for error-handling, but store the value into your own user variable, as the value will be reset after the next statement. All Global variables should be stored into a local variable if they are going to be used later in processing.

4. Always 'SET NOCOUNT ON' at the beginning of stored procedures. This will reduce the network round trips your SQL will have to make.

5. Avoid using 'print' in SPs unless it is needed. Print statements cause network trips and acknowledgments from the client. If the client does not need to know - or nothing is receiving the 'print' - this is unneeded.

6. Develop and stick with a standard naming convention for your area/project.

7. Never use a wildcard character (%) at the beginning of a search string as it will always result in a table scan.

8. When doing an insert in an SP - Always specify the column names - to avoid issues when a column is added to the table.

9. ALWAYS specify the 2 part name when executing SPs.
ex: EXEC dbo.sp_proc. When the 2 part name is not used SQL Server first checks for an SP owned by the executer, then dbo.

10. Keep Transact-SQL transactions as short as possible within a stored procedure. This helps to reduce the number of locks.

11. Don't use the prefix "sp_" in a stored procedure name. The reason for this is that whenever a stored procedure is executed with the prefix of 'sp_' SQL Server first looks in the MASTER database to execute it - if it does not find it in MASTER it will then use the current database.

12. If you use input parameters in your stored procedures, you should validate all of them at the beginning of your stored procedure.

13. Avoid nesting transactions in stored procedures. If you need transaction support from SP to SP use a savepoint.

14. If your SP does not require 'transaction safety' use the 'NOLOCK' table hint. A table hint lets the SQL engine ignore and not perform locks for a given operation.

15. Try to avoid using temporary tables inside your stored procedures. Use Table Variables if possible. Temp Tables Reduce the chance of plan reuse. Table variables also have less locking overhead, making them faster.

16. Place all DDL language at the top of the Stored Procedure. All DML should follow. This will increase the chance of plan reuse.

17. Use the 'TOP' operator over the 'SET ROWCOUNT' command to limit the number of rows returned, as there is less overhead. (And ROWCOUNT is going away after SQL2K).

18. Use Global temp tables only if you absolutely have to.

19. Clearly document and mark any optimizer hints.

20. Limit your joins to 4 tables or less. The SQL Server optimizer is only so smart (and so is your DBA).

21.Do not use a function in the where clause.
For example if you are looking for records with a timestamp greater
than 7 days old don't say
where ts > getdate() - 7.
Set a variable "SELECT @date = getdate() -7" and then use:
Where ts > @date

22. Do not use ‘Where not exists’.
Instead use a left outer join where the left side is null