How To Get High Quality Information About Query Performance

Thanks  Elisabeth Redei

copy from http://sqlblog.com/blogs/elisabeth_redei/archive/2009/03/01/how-to-get-high-quality-information-about-query-performance.aspx

 

 

  1.  

There are a zillion reasons why Profiler and its friends by far are the most potent tuning tools available for SQL Server DBAs. This post focuses on one of them but I will mention a few others at the end as well (download details for the tools mentioned are also listed at the end).

With SQL Server 2005 a new world opened to SQL Server DBAs with the introduction of DMVs and there are even more available in SQL Server 2008. There are 2 groups of DMVs that has, right-fully, gained enormous popularity - the sys.dm_db_index_* DMV's and the sys.dm_exec_* DMVs.

With the introduction of the DMVs, I have noticed that a lot of people are not inviting Profiler to their tuning parties anymore – I will show you why he (she?) will always be on my guest list.

In this post, I am going to talk about the limitations of sys.dm_exec_query_stats and sys.dm_exec_cached_plans. I will then show you how you can get aggregated performance data with much higher quality using Profiler trace data and some of the easy-to-use lightweight tools that Microsoft and the community provide for free.

Needless to say, there are situations when running a trace is not feasible and the sys.dm_exec_* DMV is a very welcome option. It’s never going to hurt anyone to tune whatever TOP queries you identify with these DMVs. For real-time troubleshooting, they are probably also your only option but then you need to be aware that sys.dm_exec_query_stats only contains aggregated data about query executions that has actually finished executing. Sys.dm_exec_request – or some combination - is a better option for real-time troubleshooting.

However and whenever you use sys.dm_exec_query_stats, you need to be aware of its limitations.

There are numerous scripts out there that queries sys.dm_exec_query_stats from different angles and they will typically look something like this:

SELECT

(total_elapsed_time/execution_count)/1000 AS [Avg Exec Time in ms]

, max_elapsed_time/1000 AS [MaxExecTime in ms]

, min_elapsed_time/1000 AS [MinExecTime in ms]

, (total_worker_time/execution_count)/1000 AS [Avg CPU Time in ms]

, qs.execution_count AS NumberOfExecs

, (total_logical_writes+total_logical_Reads)/execution_count AS [Avg Logical IOs]

, max_logical_reads AS MaxLogicalReads

, min_logical_reads AS MinLogicalReads

, max_logical_writes AS MaxLogicalWrites

, min_logical_writes AS MinLogicalWrites

,(

SELECT SUBSTRING(text,statement_start_offset/2,

(

CASE WHEN statement_end_offset = -1

then LEN(CONVERT(nvarchar(max), text)) * 2

ELSE statement_end_offset

end -statement_start_offset)/2

)

FROM sys.dm_exec_sql_text(sql_handle)

) AS query_text

FROM sys.dm_exec_query_stats qs

ORDER BY [Avg Exec Time in ms] DESC

The first time you ever used this DMV you probably thought that the gates of heaven just opened up in front of you. No more getting approval for running traces, no more copying and wading around in MBs or GBs of trace data. For a few chosen ones, this might have been the case but for most of us that was not really the case. Especially if you, like me, tend to look at many different systems and have limited amount of time to assess the workload.

Depending on the access methods used on the particular server you are looking at, you might only see half of the truth, if you are lucky the output shows you something very close to the whole truth. But it will never show you the whole truth.

Why Not The Whole Truth?

The inherent problem with with sys.dm_exec_query_stats is that it shows you aggregations from sys.dm_exec_cached_plans. And contrary to what you might believe, not all query plans will go into the procedure cache. Even if you did know this, you may not realize the full implication.

Some of the things that will NEVER be cached are:

1. DBCC and BACKUP statements

2. Any query that was issued with the RECOMPILE statement

3. So called "zero-cost" plan queries - i.e. queries that are very cheap to compile but NOT necessarily cheap to execute.

In addition to this,

You cannot see the accumulated numbers for a query whose plan is never re-used (one that uses inline literals and is not parameterized implicitly or explicitly).

Execution time for things like EXEC extended_stored_procedure or CREATE INDEX will not be included in sys.dm_exec_query_stats.total_worker_time.

On top of this, a plan can get thrown out of cache at any point in time and once it is out, it will not show up in sys.dm_exec_query_stats.

Unless you are certain that your ENTIRE WORKLOAD consists of queries that are parameterized and that none of your procedures contain “funky stuff” such as EXEC ext_stored_procedure; you absolutely have to use Profiler (or the corresponding sp_trace_* procedures) to get an accurate view of your workload.

An example of an Ad-hoc query that would “escape” sys.dm_exec_query_stats is the following:

USE Adventureworks

GO

SELECT

sp.salespersonid

, so.salesorderid

FROM sales.salesorderheader so JOIN sales.salesperson sp

ON sp.salespersonid = so.salespersonid

WHERE so.salesorderid = 43860

The above query will generate a new plan for each execution provided that the salesorderid is different and you will find a different entry in sys.dm_exec_query_stats and sys.dm_exec_cached_plan for each execution:

SELECT

total_worker_time as CPU

, total_elapsed_time AS Duration

, total_logical_writes AS TotWrites

, total_logical_reads AS TotReads

, qs.execution_count AS NoOfExecs

, text

FROM sys.dm_exec_query_stats qs

CROSS APPLY sys.dm_exec_sql_text(sql_handle)

Pic1

Fig 1. The two executions show up as separate queries in sys.dm_exec_query_stats.

For my second scenario, I created two stored procedures with the only difference that I added a WAITFOR (to simulate a call to e.g. an extended stored procedure) in one of them.

CREATE PROCEDURE uspGetSalesPerson2 (@p1 int)

AS

BEGIN

WAITFOR DELAY'00:00:10' -- uspGetSalesPerson WITHOUT the WAITFOR

SELECT

sp.salespersonid

, so.salesorderid

FROM sales.salesorderheader so JOIN sales.salesperson sp

ON sp.salespersonid = so.salespersonid

WHERE so.salesorderid = @p1

END

And then ran them:

uspGetSalesperson2 44501

go

uspGetSalesperson 44501

Pic2

Fig 2. As you can see, the 10 WAITFOR seconds in supGetSalesperson2 are not accounted for in the total_worker_time (Duration) at all.

What About SQL Server 2008?

In SQL Server 2008, there is something called query fingerprints/query hash which will let you see aggregates for non-parameterized queries such as the above. It can also be used to identify which queries in your workload that would benefit from parameterization.

However, it still requires that the plan was put in cache in the first place. The Activity Monitor uses this new feature but the Management Datawarehouse does not in the RTM version (you can read more about this new feature in Bart Duncan’s blog, “The Best New SQL 2008 Feature You Never Heard Of”, http://blogs.msdn.com/bartd/archive/2008/09/03/Query-Fingerprints-and-Plan-Fingerprints_3A00_-The-Best-New-SQL-2008-Feature-You_2700_ve-Never-Heard-Of.aspx).

To address the problem with duration for statements other than SELECT/INSERT/UPDATE/DELETE, there is a new DMV in SQL Server 2008 called sys.dm_exec_procedure_stats which aggregates consumption for ALL statements executed within a stored procedure.

Enter Profiler And Friends

Anyone who has used Profiler knows that it in itself can do nothing about the above – in fact it does anything BUT aggregate data. This is when the friends should join the party. For the above query (first executed with so.salesorderid = 44501 and then with so.salesorderid = 43860), the equivalent output from my personal favorite friends, SQLNexus and Reporter, look like this:

Pic3

Fig 3. Notice the number of executes and how Readtrace has parsed the inline literal in the Query Template in order to be able aggregate the performance data.

And for my second scenario with the missing WAITFOR duration:

pic4

Fig 4. Notice the difference in Duration for uspGetSalesperson2 compared to Fig 2.

So what did I do? I first ran a Profiler trace with the default template adding just the StmtStarting and StmtCompleted events (although in a real world scenario you would want to add some more events and you might also save the trace definition as a script and use the sp_trace_* procedures to avoid the overhead of the Profiler GUI).

I then ran SQLNexus, connected to one of the SQL Server instances on my machine and imported the trace file I just created. SQLNexus will at this point create a database called SQLNexus (default which can be changed) where the crunched trace data goes and then generate a number of reports. The number of reports available depends on what events you included in the trace.

pic5

Fig 5

If you switch to “ReadTrace_Main” you will find the output found in Fig3 and Fig4 at the bottom of the screen. As you can see in Fig 3 & 4, the information is somewhat limited and there is certainly some vital information missing. For instance, I like to know whether there are variations in Execution times to pinpoint intermittent bad execution plans, resource or blocking issues.

Pic6

Fig 6.

By clicking the statement in the grid, you will get to the “Unique Batch Details” report which at the end has the following data:

Pic7

Fig 7.

In order to get that information, you can query the data in the SQLNexus database as well. In addition to all the data necessary to produce the reports in the SQLNexus GUI, it also contains a number of views you can query (or you can query the base tables directly). You can also make your own custom reports.

To get the data in one single report, I ran the following SELECT statement:

USE sqlnexus

GO

SELECT

vwU.NormText

, vwB.CompletedEvents

, vwB.Duration

, vwB.AvgDuration

, vwB.Reads

, vwB.AvgReads

, vwB.Writes

, vwB.AvgWrites

, vwB.CPU

, vwB.AvgCPU

, vwB.HashID

FROM readtrace.vwBatchUtilization vwB

INNER JOIN readtrace.vwUniqueBatches vwU

ON vwB.hashid = vwU.hashid

Pic8

Fig 8. Notice the HashID column to the far right – that is basically a “query fingerprint” for this particular query.

Using Reporter, Baseline and Comparison Data

One really cool feature of Reporter, is that you can load two different trace data databases and look at comparative data. So you could have one baseline database and then at any point in time, capture a new trace and compare the results in Reporter:

Pic9

Fig 9. Comparative data from Reporter.

It will also identify which queries are found in which trace data set (i.e. if there are new TOPn queries on the scene or if any of your old TOPn queries are gone). Be aware that you can change the TOP parameter to show anything between TOP 3 to TOP 100.

Pic10

Fig 10. Identifying new queries with Reporter.

More Information About the Tools

As mentioned, SQLNexus has a number of drill-downable reports available and it can also import Performance Monitor output and WAITSTATS (sys.dm_os_wait_stats output) data. Reporter on the other hand, has the same drill-down reports but works only with trace data (and some extra features such as the comparison featured mentioned you will not find in SQLNexus).

Both utilities require ReadTrace, one of the RML Utilities released by Microsoft and the data format for both Reporter and SQLNexus is the same. By default, ReadTrace creates a database name PerfAnalysis but if you run ReadTrace via SQLNexus, the name will, as mentioned earlier be SQLNexus by default.

In SQLNexus, you can point to the trace files and it will import them for you whereas Reporter requires you to first run ReadTrace to create the database. If you are a little bit lazy like me (or “productivity focused” as I like to call it) you can use the GUI in SQLNexus to create the trace data database and then point Reporter to that database.

Another perk is that once you have the trace data, you can easily feed it to the Database Tuning Advisor (which is a good idea considering the limitations of the sys.dm_db_index_* DMVs… but that’s another story… and an upcoming blog post).

My strong recommendation is that you at least every now and then try to find a window where you can capture trace data to make a full assessment of the workload of your SQL Server(s).

If the above is not necessary to convince you to bring along Profiler and his cool friends, I will mention a few other reasons:

The ability to Replay trace data. For instance, before applying a service pack, capture a trace of your critical workload and Replay it against an upgraded test server.

  1.  

(With OStress, another RML Utility, or the built-in “Replay events using multiple threads” feature of Profiler, you can simulate multiple users.)

Profiler will also catch the number of rows returned to the client so you can identify queries that are returning over-the-top amounts of rows. These queries are not necessarily expensive to execute.

You can easily detect if cursors are used and in particular cursors with a low rowfetch size

It will give you information about runtime errors and performance related warnings (such as HASH and SORT warnings)

If you have a performance monitor log from the same time window, you can correlate the data from within the Profiler GUI.

In essence, there is nothing else available that will give you a complete picture about the workload on the server. The best method to get to know a system is to gather and analyze trace data (with whatever tool you are comfortable with) and maybe even spend a bit of time just browsing through a trace of some of your more critical workload(s).

More Information and Download Areas

You can find (a lot) more information about the tools in the following article. At the end of the article you will find links to download areas:

“Precision Performance for Microsoft SQL Server using RML Utilities 9.0”

http://sqlcat.com/technicalnotes/archive/2008/02/01/precision-performance-for-microsoft-sitql-server-using-rml-utilities-9-0.aspx

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章