If you are going through the Sitecore log of your installation, you may have noticed some log entries reporting of exceeded thresholds. The message looks something like this:
2188 20:59:06 WARN Timing threshold exceeded for web page. Milliseconds: 38622,88. Threshold: 1000. Page URL: /Default.aspx
These are very common, but should not appear in larger numbers, as this might mean that you have some performing issues.
This article describes why they appear and how to troubleshoot them.
What is the threshold exceeds?
There are three different thresholds in Sitecore: The timing, memory and item thresholds. They all work by using a counter. The counter is initialized when an end user requests a page by entering a URL. More specific the counter is initialized in the httpRequestBegin pipeline in the StartMeasurement processor. When the request ends, the counter is measured and if the number exceeds a threshold set in the web.config, it is logged in the Sitecore log. The log entry includes an indication of what type of exceed it is (time, memory or item), how much the counter value is, the threshold and the URL which were requested.
The timing threshold
The timing threshold indicates if a request takes too long. A timer is started upon request and when the request ends, the time is measured. By default the threshold is set to 1000ms which corresponds to 1 second. So when you see a log entry like this:
2188 20:59:06 WARN Timing threshold exceeded for web page. Milliseconds: 38622,88. Threshold: 1000. Page URL: /Default.aspx
It indicates that someone has requested the URL /default.aspx and Sitecore spent over 38 seconds processing the request. As a system administrator I know this sounds a bit alarming, and it may be. In this incident I know it is because the .Net process restarted.
What you need to be aware of, is that the timer is not CPU time, but total time for the request. This means that the reason for the long response time, may be caused by other things than the request itself. So you cannot use a single entry to point out anything. What you can do however, is getting statistics over timing threshold exceeds. By parsing the log you should be able to identify if a certain URL repeatedly results in an exceed. If this is the case, the page requested may have a performance issue.
Timing threshold exceeds can be caused by a lot of things. Generally you need to consider if threshold exceeds are revolving around a single or few URL’s or if it is spread out on a lot of URL’s. If the entries report the same URL a lot, this often mean that you have a rendering or sublayout on the page, which has some bad performing logic. If the URL’s are spread out it could be that you’re servers aren’t able to keep up with requests or you’re network between your SQL server and web server may be having problems. It could be a lot of things, but it is probably more of a general performance issue.
The memory threshold
The memory threshold works similar to the timing threshold. When a request is handled by Sitecore the StartMeasurement processor looks at the total amount of memory currently allocated. When the request ends the delta in total memory allocated is measured. If the delta is larger than the threshold specified in the web.config, it is reported with a log entry like this:
2188 20:59:06 WARN Memory threshold exceeded for web page. Memory used: 21399KB. Threshold: 10000KB. Page URL: /Default.aspx
This entry says that more than 21mb of memory has been allocated during the time this request has taken.
Again you need to be careful how you interpret this information. First of all you can’t be sure that it is the given request, which allocated the memory. Also the longer the request has taken, the longer is the period for measuring the delta and hence more natural with the increase. The cause of the memory exceed is therefore difficult to determine and I would generally recommend using the standard .net performance counters instead.
The item threshold
The item threshold is probably the most important counter, as it gives some really good indications of what is slowing your site down. You can’t find this type of information in any other .net debugging tool. The counter works by initiating a counter in the beginning of the request, indicating how many items in Sitecore has been accessed during the period of the request.
In the same way as the other counters it works globally and isn’t bound to the single request. So you can’t use a single log entry to analyze anything. Instead you should draw statistics and you will probably see a clear indication which pages iterates over a large amount of items. This is usually the number 1 reason, why a site won’t perform. So if you see a lot of log entries looking like this:
2912 21:10:28 WARN Item threshold exceeded for web page. Items accessed: 2633. Threshold: 1000. Page URL: /Default.aspx
It is probably caused by a rendering using a //item or descendant or similar. This can completely kill the performance of your site, and should be handled. The solution is often to use an index rather then iterating over a complete tree of items.
Should I adjust the default thresholds?
Often when I visit a client with performance problems, they have either increased the thresholds dramatically or completely removed them. They claim that the massive amount of log entries was slowing down the site a lot. Now this is a really bad argument. Not only is it fixing the symptoms and not the problem, you also disable a future consultant from using the information for debugging. My experiences are that the default threshold values are very accurate. If you see too many entries, it is probably because, you have a rendering or sublayout which isn’t performing well enough for production. So my advice is… Don’t adjust the settings.