Well, "Performance Optimization" in NAV & SQL is some sort of "Cornucopia of issues" – starting from Hardware Sizing, Platform Configuration, Index Tuning, SIFT Optimization, etc. … and – last but not least – the question: "How to prevent and solve blocking conflicts?"

At the "Directions EMEA 2008" Event in Paris I tried to give some feasible, practical advices to answer this question. The following should give some further explanations, in addition to the "Session Package" one could download from my website: http://www.stryk.info/directionsemea_blockdetection.zip

(Based on NAV 4.00 and SQL Server 2005)

### Changes 12.05.2008 ### >>>

Due to the great response I got on this BLOG I could further improve the scripts, thus the processes are more reliable, especially when it is about "translating" from "waitresource" into "table name". Additionally, now the relevant Index Names are shown, too.(Many thanks to Legosz (Mark) for his inspiration and support with this!)

### Changes 12.05.2008 ### <<< 

### Changes 03.11.2008 ### >>>

I'm still getting a lot - throughout positive - response on this; also lots of recommendations how to further improve the process. So, I have updated the Code snippets, providing a further improved way of "Block Detection", e.g. getting the "waitresource" was enhanced and finding the "block originator" was added. (Thanks a lot to Rama Mathanmohan for his support!)

### Changes 03.11.2008 ### <<< 

 General

"Locking" is actually some sort of reservation of a system-resource as a record, table, etc.. "Locking" is crucial to grant the consistency of data, and every database MUST lock at certain time at a certain extend. When we're talking about "locking problems" we actually mean "blocking problems"; means if two (or more) processes want to lock the same resource and a conflict is generated. Thus, we are encountering problems when a "lock" results in a "block". Hence, it is a matter of probability if/how/when such a conflict could occur. To reduce "blocking conflicts" it is necessary to reduce the probability of "locking conflicts" …

Lock Granularity

To minimize the risk of blocks it is necessary to keep the "lock granularity" (locking-level) as small as possible. Lock granules are for example Row, Range, Page, Index, Key, Table, etc.. The lower the granularity, the lower the probability of encountering blocks. For example, the probability that two processes at the same time want to work with the "Sales Header" table is much higher than the probability that these two processes want to work with the same record (the same document). Thus, if the "Sales Header" table would be locked, the risk of getting blocked if far higher than if just a single record was locked!

Usually SQL Server sets the optimal lock granularity, but with NAV this could be overruled (which is mostly a bad idea). It is a common misunderstanding that the "Always Rowlock" feature in NAV is required to lock on row-level. Regardless of this setting, SQL Server will ALWAYS start locking on the lowest possible granularity, e.g. row-level. If "Always Rowlock" is enabled, then NAV will add the query hint ROWLOCK which prevents "Lock Escalation": If SQL Server recognizes that the administration of multiple Row-Locks is costly (threshold is short before 1000 row-locks) it will escalate the locking level, means it will replace the many Row-Locks by e.g. few Range-Locks or Page-Locks, etc..

The ROWLOCK hint prevents this escalation and forces SQL Server to keep the Row-Lock granularity. Hence, "Always Rowlock" is indeed reducing blocking conflicts – due to the forced low granularity – but this could be at cost of the overall performance: all these locks have to be maintained and require a sufficient amount of RAM. I recommend to disable "Always Rowlock" and investigate and solve the arising blocking conflicts specifically, not wasting precious hardware resources.

With NAV before version 4.00 the lock granularity could also be set:

CREATE TABLE [$ndo$dbconfig] (config VARCHAR(1024))
GRANT SELECT ON [$ndo$dbconfig] TO [public]
GO
INSERT INTO [$ndo$dbconfig] VALUES ('DefaultLockGranularity=Yes')

(See also the "Application's Designers Guide")

The "Lock Granularity" is also influenced by the so called "Transaction Isolation Level" – the degree of how much a transaction is isolated from other transactions. With NAV the default level is READUNCOMMITED, means even data written by transactions that have not yet committed could be read from other transactions (aka "Dirty Reads").

With READUNCOMMITTED the risk of encountering blocks is minimal. The highest isolation level is SERIALIZABLE – here a transaction is completely isolated from others, hence it has the highest risk of generating blocking conflicts. With the NAV commands Record.LOCKTABLE or Record.FINDSET(TRUE) the TSQL statement SET TRANSACTION ISOLATION LEVEL is sent, thus the transaction is serialized. Further, after serialization NAV sends an UPDATELOCK hint which transforms the formerly set "Shared Locks" (S) into "Exclusive Locks" (X) (see "Books Online" for details).

Hence, serializing transactions could cause trouble, so it should be only done where it is really necessary, e.g. to avoid "Dirty Reads" (which is the case in all standard posting routines when Ledger Entries are created).

Automatic Block Detection

Before Blocks could be resolved it is necessary to know precisely the "Who is Who" about the conflicts; means "Who is blocking whom?", "When?" and "On which Resource?" – and of course: "How often does a conflict occur?". Solving blocking conflicts basically has to be done within the C/AL source code – and this task could be very complex and time consuming! Thus it is crucial to know about this "Who is Who" to fix the worse problems first, not spending time on singular events. The following describes one solution – there might be plenty – which I consider quite simple but smart:

Step 1 – Block Log Table

To collect the data about blocks we need a table ("ssi_BlockLog") for storage:

USE [Navision]
GO CREATE TABLE [dbo].[ssi_BlockLog]
 CREATE TABLE [dbo].[ssi_BlockLog]
 (
  [entry_no] bigint identity constraint [ssi_BlockLog$pk_ci] primary key clustered,
  [timestamp] datetime,
  [db] varchar(128) collate database_default,
  [waitresource] varchar(128),
  [table_name] varchar(128) collate database_default,
  [index_name] varchar(128) collate database_default,
  [waittime] bigint,
  [lastwaittype] varchar(128),
  [spid] int,
  [loginame] varchar(128) collate database_default,
  [hostname] varchar(128) collate database_default,
  [program_name] varchar(128) collate database_default,
  [cmd] nvarchar(max) collate database_default,
  [status] varchar(128) collate database_default,
  [cpu] bigint,
  [lock_timeout] int,
  [blocked by] int,
  [loginame 2] varchar(128) collate database_default,
  [hostname 2] varchar(128) collate database_default,
  [program_name 2] varchar(128) collate database_default,
  [cmd 2] nvarchar(max) collate database_default,
  [status 2] varchar(128) collate database_default,
  [cpu 2] bigint,
  [block_orig_id] int,
  [block_orig_loginame] varchar(128) collate database_default
  )
GO

Step 2 – Stored Procedure to save the data 

A "Stored Procedure" "ssi_blockdetection" is used to extract the block-information from various DMV (Dynamic Management Views) and save it into table ssi_BlockLog (from step 1).

DROP PROCEDURE [dbo].[ssi_blockdetection]
GO

CREATE PROCEDURE [dbo].[ssi_blockdetection]
  @mode varchar(10) = 'loop',         -- "loop" or "once"
  @threshold int = 1000,              -- Block threshold in milliseconds
  @frequency int = 3,                 -- Check frequency in milliseconds
  @save tinyint = 0                   -- save output to table ssi_BlockLog (0 = no, 1 = yes)
AS
if @mode <> 'once' begin
  print '*********************************************************'
  print '***              STRYK System Improvement             ***'
  print '***    Performance Optimization & Troubleshooting     ***'
  print '***  (c) 2008, STRYK System Improvement, Jörg Stryk   ***'
  print '***                   www.stryk.info                  ***'
  print '*********************************************************'
  print '              Version 4.00, Date: 24.10.2008             '
  print ''
end

if (@mode not in ('loop', 'once')) begin
  raiserror ('ERROR: Invalid Parameter @mode: %s', 15, 1, @mode)
  return
end
if (@threshold < 1) begin
  raiserror ('ERROR: Invalid Parameter @threshold: %i', 15, 1, @threshold)
  return
end
if (@frequency < 1) begin
  raiserror ('ERROR: Invalid Parameter @frequency: %i', 15, 1, @frequency)
  return
end
if (@save not in (0,1)) begin
  raiserror ('ERROR: Invalid Parameter @save: %i', 15, 1, @save)
  return
end

if @mode <> 'once' begin
  print 'Block Detection Mode  : ' + @mode
  print 'Block Threshold (msec): ' + convert(varchar(15), @threshold)
  print 'Check Frequency (sec) : ' + convert(varchar(10), @frequency)
  print 'Save Output to table  : ' + convert(varchar(10), @save)
  print ''
  print 'Searching for blocked processes ...'
  print ''
end

set nocount on
set statistics io off
declare @spid int, @spid2 int, @loginame varchar(128), @blocked_by int, @blocked_by_name varchar(128), @orig_id int, @orig_name varchar(128), @timestmp datetime, @i int

if @mode = 'once'
  goto start_check

while 1 = 1 begin

  start_check:

  if exists (select * from sys.dm_exec_requests where [blocking_session_id] <> 0) begin
    print 'Checkpoint ' + convert(varchar(30), getdate())
      
    if @save = 0 begin

     select distinct
             [db] = db_name(s1.[database_id]),
             [waitresource] = ltrim(rtrim(s1.[wait_resource])),
             [table_name] = object_name(sl.rsc_objid),           
             [index_name] = si.[name],
             s1.[wait_time],
             s1.[last_wait_type],
             s1.[session_id],
             session1.[login_name],
             session1.[host_name],
             session1.[program_name],
             [cmd] = isnull(st1.[text], ''),
             session1.[status],
             session1.[cpu_time],
             s1.[lock_timeout],
             [blocked by] = s1.[blocking_session_id],            
             [login_name 2] = session2.[login_name],
             [hostname 2] = session2.[host_name],
             [program_name 2] = session2.[program_name],
             [cmd 2] = isnull(st2.[text], ''),
             session2.[status],
             session2.[cpu_time]         
       -- Process Requests
       from sys.dm_exec_requests (nolock) s1
       outer apply sys.dm_exec_sql_text(s1.sql_handle) st1
       left outer join sys.dm_exec_requests (nolock) s2 on s2.[session_id] = s1.[blocking_session_id]
       outer apply sys.dm_exec_sql_text(s2.sql_handle) st2
       -- Sessions
       left outer join sys.dm_exec_sessions (nolock) session1 on session1.[session_id] = s1.[session_id]
       left outer join sys.dm_exec_sessions (nolock) session2 on session2.[session_id] = s1.[blocking_session_id]
       -- Lock-Info
       left outer join  master.dbo.syslockinfo (nolock) sl on s1.[session_id] = sl.req_spid
       -- Indexes
       left outer join sys.indexes (nolock) si on sl.rsc_objid = si.[object_id] and sl.rsc_indid = si.[index_id]
       where s1.[blocking_session_id] <> 0
             and (sl.rsc_type in (4,5,6,7,8,9)) and sl.rsc_indid <> 0 and sl.req_status = 3
             and s1.[wait_time] >= @threshold

    end else begin

      set @timestmp = getdate()

      insert into [ssi_BlockLog]
      select distinct
             @timestmp,
             [db] = db_name(s1.[database_id]),
             [waitresource] = ltrim(rtrim(s1.[wait_resource])),
             [table_name] = object_name(sl.rsc_objid),           
             [index_name] = si.[name],
             s1.[wait_time],
             s1.[last_wait_type],
             s1.[session_id],
             session1.[login_name],
             session1.[host_name],
             session1.[program_name],
             [cmd] = isnull(st1.[text], ''),
             session1.[status],
             session1.[cpu_time],
             s1.[lock_timeout],
             [blocked by] = s1.[blocking_session_id],            
             [login_name 2] = session2.[login_name],
             [hostname 2] = session2.[host_name],
             [program_name 2] = session2.[program_name],
             [cmd 2] = isnull(st2.[text], ''),
             session2.[status],
             session2.[cpu_time],
             [block_orig_id] = null,
             [block_orig_id] = null
       -- Process Requests
       from sys.dm_exec_requests (nolock) s1
       outer apply sys.dm_exec_sql_text(s1.sql_handle) st1
       left outer join sys.dm_exec_requests (nolock) s2 on s2.[session_id] = s1.[blocking_session_id]
       outer apply sys.dm_exec_sql_text(s2.sql_handle) st2
       -- Sessions
       left outer join sys.dm_exec_sessions (nolock) session1 on session1.[session_id] = s1.[session_id]
       left outer join sys.dm_exec_sessions (nolock) session2 on session2.[session_id] = s1.[blocking_session_id]
       -- Lock-Info
       left outer join  master.dbo.syslockinfo (nolock) sl on s1.[session_id] = sl.req_spid
       -- Indexes
       left outer join sys.indexes (nolock) si on sl.rsc_objid = si.[object_id] and sl.rsc_indid = si.[index_id]
       where s1.[blocking_session_id] <> 0
             and (sl.rsc_type in (4,5,6,7,8,9)) and sl.rsc_indid <> 0 and sl.req_status = 3
             and s1.[wait_time] >= @threshold
   
      update [dbo].[ssi_BlockLog] set [table_name] = '- unknown -' where [table_name] is null

      -- get block originator
      declare originator_cur cursor for select [blocked by], [loginame 2]
        from [dbo].[ssi_BlockLog]
        where [timestamp] = @timestmp
        for update
      open originator_cur
      fetch next from originator_cur into @blocked_by, @blocked_by_name
      while @@fetch_status = 0 begin
        set @i = 0
        set @orig_id = @blocked_by  
        set @orig_name = @blocked_by_name
        set @spid2 = @blocked_by
        while (@spid2 <> 0) and (@i < 100) begin
          if exists(select top 1 [blocked by] from [dbo].[ssi_BlockLog] where ([timestamp] = @timestmp) and ([spid] = @spid2)) begin
            select top 1 @spid = [blocked by], @loginame = [loginame 2] from [dbo].[ssi_BlockLog] where ([timestamp] = @timestmp) and ([spid] = @spid2)
            set @orig_id = @spid
            set @orig_name = @loginame                      
            set @spid2 = @spid        
          end else
            set @spid2 = 0
          set @i = @i + 1   -- "Emergency Exit", to avoid recursive loop
        end
        update [dbo].[ssi_BlockLog] set [block_orig_id] = @orig_id, [block_orig_loginame] = @orig_name where current of originator_cur
        fetch next from originator_cur into @blocked_by, @blocked_by_name
      end
      close originator_cur
      deallocate originator_cur

    end
  end

  end_check:

  if @mode = 'once'
    return

  waitfor delay @frequency
end

Step 3 – Job to save the data

A SQL Server Agent Job will then execute the procedure "ssi_blockdetection" (from step 2) .

USE [msdb]
GO BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'STRYK System Improvement' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'STRYK System Improvement'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'SSI: Block Detection',
        @enabled=1,
        @description=N'Automatic Block-Detection by STRYK System Improvement,
http://www.stryk.info'
,
        @category_name=N'STRYK System Improvement',
        @owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'blockdetection',
        @step_id=1,
        @cmdexec_success_code=0,
        @on_success_action=1,
        @on_fail_action=2,
@subsystem=N'TSQL',
        @command=N'EXECUTE ssi_blockdetection @mode=''once'', @threshold=1, @frequency=1, @save=1',
        @database_name=N'Navision' -- Change Database Name on demand        
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:

(Remark: the TSQL which extracts the data from the system-tables might be improved, it's just an example – please feel free to improve, I would appreciate your support!)

Step 4 – Alert to monitor "Processes Blocked"

To automatically keep track about the blocks a SQL Server Agent Alert ("SSI: Block Detection") could monitor the SQL Server Performance Counter "SQLServer::General Statistics – Processes blocked"; further it responds by executing the SQL Agent Job:

USE [msdb]
GO EXEC msdb.dbo.sp_add_alert @name=N'SSI: Block Detection',
        @enabled=1,
        @delay_between_responses=10,
        @include_event_description_in=0,
        @performance_condition=N'SQLServer:General Statistics|Processes blocked||>|0',
        @job_name=N'SSI: Block Detection'
GO

Now, if the number of "Processes blocked" raises above 0 – a block was encountered – this Alert is triggered and executes the SQL Server Agent Job ("SSI: Block Detection" from Step 2) which logs the block-information - the "Who is Who of Blocks"!

THAT'S IT – AUTOMATIC BLOCK DETECTION ESTABLISHED!!!

Finally we could easily analyze this data, e.g. finding out who is cause or victim of blocks, which resources are affected, how long do the blocks take, etc.. For example:

USE [Navision]
GO

SELECT [waitresource], [table_name], [index_name], [blocked_login] = [loginame], [blocking_login] = [loginame 2], [block_orig_loginame], [count] = COUNT([entry_no]),[max_duration] = MAX([waittime])
FROM dbo.ssi_BlockLog
--WHERE [timestamp] BETWEEN '17.04.2008 00:00' AND '17.04.2008 23:59'
GROUP BY [waitresource], [table_name], [index_name], [loginame], [loginame 2], [block_orig_loginame]
ORDER BY COUNT([entry_no]) DESC
GO

SELECT * FROM ssi_BlockLog ORDER BY [entry_no] DESC
GO

(Remark: these examples a created for a database called "Navision" – please change on demand)

Solving and Preventing Blocks

Well, we have to be aware that it is impossible to solve all kinds of potential blocking conflicts, as it is impossible to predict all probably – and improbably – circumstances of conflicting processes. Means: there will always be blocks, we only can reduce the problems down to a certain degree. Due to the way how certain processes are designed in NAV – e.g. posting routines – the success is limited to the number of concurrent users running these transactions in parallel. To reduce blocks these optimizations are necessary:

  1. SIFT Optimization

(see my BLOG http://dynamicsuser.net/blogs/stryk/archive/2007/10/28/included-columns-vs-sift-tables.aspx for further details) When SIFT data is updated (aggregation) the system has to establish tight locks to make sure only committed data is included when summing up the figures. The more aggregation levels (buckets) exist, and the higher this aggregation happens, the higher is the risk of encountering blocks. (Remark: With NAV 5.0 SP1 – Indexes Views – this is void)

  1. Index Optimization

If the index structure in insufficient usually transactions take much longer because "Index Scans" are performed instead of "Index Seeks". "Scanning" indexes – reading the leaf node level – is time consuming, and has a high risk of getting blocked, e.g. if a write transactions locks a record within the "to-be-scanned" range. Optimizing indexes speeds up transactions and reduced blocking conflicts.

  1. Process Optimization

It is important to optimize the "length"/duration of processes and transactions in NAV. The longer a transaction takes, the more locks are engaged (lock escalation), the higher the risk of causing/encountering blocks. Also, the longer transactions take, the more system resources are occupied. Shorting transactions with a COMMIT – at the right time and place! – will release locks and resources.

  1. Increase Processing Speed

The faster a transaction could be processed, the lower is the risk of encountering blocks. So speeding up the system in any way (CPU, RAM, Disks, Network, etc.) could be quite feasible, too. When it is about "Index Optimization" especially the design of the Clustered Index could have major impact on blocks! The following should give few examples:

Example 1:

Table 357 "Document Dimension" Clustered Index (Standard): Table ID, Document Type, Document No., Line No., Dimension Code

Looking at the physical structure of the table, the records are actually stored in this way:

Process A

Table ID

Document Type

Document No.

Line No.

Dimension Code

Process B

  

36

Order

100

0

Dim1

  

  

36

Order

100

0

Dim2

  

  

36

Order

200

0

Dim1

  

  

36

Order

200

0

Dim2

  

  

36

  

  

37

Order

100

10000

Dim3

  

  

37

Order

100

20000

Dim4

  

  

37

Order

200

10000

Dim3

  

  

37

Order

200

20000

Dim4

  

  

37

  

In NAV data from T357 is selected by querying the "Dimension" records e.g. from the "Sales Header" and "Sales Line" table.

Process A (simplified code):

DocDim.SETFILTER("Table ID", '36|37');
DocDim.SETRANGE("Document Type", Order);
DocDim.SETRANGE("Document No.", 100);

Process B (simplified code):

DocDim.SETFILTER("Table ID", '36|37');
DocDim.SETRANGE("Document Type", Order);
DocDim.SETRANGE("Document No.", 200);

In SQL this results in reading – and locking! – the whole range of records from "Header" to "Line". As soon as a write transaction starts, e.g. by posting the Order, "Exclusive Locks" are engaged. Hence, in this example Process A also locks the "Header" records from Order 200, even though it is a different document – a block occurs as soon as Process B tries to post Order 200! This problems gets worse the more documents were created, as the physical distance between "Header" and "Lines" get bigger and bigger …

By forcing an optimized physical order of the records by changing the "Clustered Index" these kind of block could be prevented:

Clustered Index (Optimized): Document No., Document Type, Table ID, Line No., Dimension Code

Process A

Document No.

Document Type

Table ID

Line No.

Dimension Code

Process B

  

100

Order

36

0

Dim1

  

  

100

Order

36

0

Dim2

  

  

100

Order

37

10000

Dim3

  

  

100

Order

37

20000

Dim4

  

  

200

Order

36

0

Dim1

  

  

200

Order

36

0

Dim2

  

  

200

Order

37

10000

Dim3

  

  

200

Order

37

20000

Dim4

  

  

  

  

  

This optimizations improves the overall performance of posting transactions, they'll perform faster and encounter way less blocks!

Example 2:

Whenever a "Ledger Entry" is written in NAV, the following algorithm is used:

LedgerEntry.LOCKTABLE;
LedgerEntry.FINDLAST;
NewEntryNo := LedgerEntry."Entry No." + 1;

All these tables have a Primary Key/Clustered Index of "Entry No.". The first problem here is, that the transaction is serialized, means "Dirty Reads" are prevented, thus, if a is written it exclusively locks the last record – other processes are blocked and cannot add another record until the lock has been released. This behavior is intended to grant the consistent sequence in numbering of the records.

FOR ALL KINDS OF FISCALLY RELEVANT TABLES (e.g. G/L Entry, VAT Entry, etc.) THIS MUST NOT BE CHANGED – else you would jeopardize the correctness of the book-keeping!

But there are tables where this "sequence" is not important (e.g. "Change Log Entry", "Warehouse Entry", etc.) where we could improve things. The second problem is the query at all – the FINDLAST – to just retrieve a record to find the next "Entry No." (thankfully the FINDLAST just fetches one record, with the old FIND('+') is was even worse!) By enabling the "AutoIncrement" property of the field "Entry No.", it is not necessary to query the SQL Server for the number – as SQL Server will generate the "Entry No." automatically on INSERT! (see table 405 "Change Log Entry" for example!) But the third – and most important – problem is the physical order of the records.

By default they are store in sequence, new records are added to the end. When a new record is inserted – added at the end of the table – an Exclusive Lock on this row is established (ROW X) and an Intended-Exclusive Lock on the Page (PAG IX). When a second process tries to add another record, the PAG IX prevents that another ROW X is set on this page – the second process gets blocked!

Process A

Entry No.

Process B

  

1

  

  

2

  

  

3

  

ROW X / PAG IX

4 (New Record)

Blocked!

 These conflicts could be reduced by introducing a different "Clustered Index" – using a new field of type GUID (Globally Unique Identifier) (to be created with method CREATEGUID).

Primary Key: "Entry No." (type Integer) Clustered Index: "UID" (type GUID)

Process A

Entry No.

UID

Process B

  

2

{A}

  

  

5 (New Record)

{B}

ROW X / PAG IX

  

1

{C}

  

ROW X / PAG IX

4 (New Record)

{D}

  

  

3

{E}

  

The GUID values are not created in sequence, so actually records are not added to the end of the table, but inserted in between the records. If the table is big enough – to occupy lots of pages – the probability is very high that every insert happens on a different page, so the PAG IX does no harm anymore!

These two examples should give evidence about the importance of the physical structure of a table! Please regard, that these solution are actually quite contrary: in Example 1 a certain physical order was forced, and in Example 2 a physical dis-order (chaos) was the goal. Two opposite solutions – both to fix blocking problems.  

This should point out that it is "tricky business" to find the ideal physical structure for some tables.

(Remark: please have in mind that all the examples above have advantages and disadvantages, so take care when implement this – anyway, the change in T357 is quite failsafe)

[@Mark/Bert: I would appreciate if you could share your experiences on this specific issue! Thanks!]

 Deadlocks

A sort of "first class" blocking situation is, when a so called "Deadlock" occurs:

Process A

Process B

What happens?

Locking Resource #1

Locking Resource #2

A: Lock Grant, B: Lock Grant

Locking Resource #2

  

A: Lock Wait – Blocked

  

Locking Resource #1

B: Lock Wait - Blocked

Now we have a situation where two (or more) processes are holding locks, and are trying to also lock the resource from the other process – resulting in both processes are waiting for the other to release the lock.

While with the "native" C/SIDE Server both processes indeed would wait for eternity (at least in older versions), the SQL Server recognizes this situation and automatically resolves it: it chooses one process as "victim" (regarding CPU time, number of pages changed and – theoretically, but not with NAV – the DEADLOCK_PRIORITY) and "kills" that process (= cancel the transaction and rolling it back) so that the other(s) could continue. This usually takes 3 to 5 seconds. While with "normal" blocks a process is just forced to wait, a Deadlock terminates the processing (as LOCKTIMEOUT would do with a block) which could be a problem for e.g. unattended processes (NAS, Job Scheduler) if those cannot resume after the "kill".

Automatic Deadlock Detection

It is common knowledge that with the trace-flags 1204 and 1222 (or 1204 [, 1205] and 3605 in SQL 2000) deadlock information could be written into the SQL Error-Log. Unfortunately it is a real "pain" to analyze this Log, especially if lot of Deadlocks were encountered. With SQL 2005 its much more easier to gather this information!

With the SQL Profiler the "Deadlock Graph" event could be monitored. This DLG is based on a XML structure, which makes analysis simple.

Step 1 – Automatic Profiler Trace

As any SQL Profiler Trace, so a "Deadlock Graph" trace could be scripted. Thus, a TSQL script could be created to automatically start such a trace, e.g. when starting the SQL Agent Service, writing the DLG into a file:

USE [msdb]
GO BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'STRYK System Improvement' AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'STRYK System Improvement'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'SSI: Deadlock Trace',
        @enabled=1,
        @notify_level_eventlog=2,
        @notify_level_email=0,
        @notify_level_netsend=0,
        @notify_level_page=0,
        @delete_level=0,
        @description=N'Automatic Deadlock-Trace by STRYK System Improvement,
http://www.stryk.info',
        @category_name=N'STRYK System Improvement',
        @owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'start trace',
        @step_id=1,
        @cmdexec_success_code=0,
        @on_success_action=1,
        @on_success_step_id=0,
        @on_fail_action=2,
        @on_fail_step_id=0,
        @retry_attempts=0,
        @retry_interval=0,
        @os_run_priority=0, @subsystem=N'TSQL',
        @command=N'declare @rc int
declare @TraceID int
declare @maxfilesize bigint
set @maxfilesize = 500
exec @rc = sp_trace_create @TraceID output, 0, N''C:\ssi_Deadlock_Trace'', @maxfilesize, NULL
if (@rc != 0) goto error
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 148, 11, @on
exec sp_trace_setevent @TraceID, 148, 12, @on
exec sp_trace_setevent @TraceID, 148, 14, @on
exec sp_trace_setevent @TraceID, 148, 1, @on
declare @intfilter int
declare @bigintfilter bigint
exec sp_trace_setstatus @TraceID, 1
select
TraceID=@TraceID
goto finish
error:
select
ErrorCode=@rc

finish:
go
',
        @database_name=N'master',
        @flags=4
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Auto',
        @enabled=1,
        @freq_type=64,
        @freq_interval=0,
        @freq_subday_type=0,
        @freq_subday_interval=0,
        @freq_relative_interval=0,
        @freq_recurrence_factor=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:

Here the DLG are written to C:\ssi_Deadlock_Trace.trc.

THAT'S IT – AUTOMATIC DEADLOCK DETECTION ESTABLISHED!!!

To analyze the Deadlocks we could proceed like this:

Step 1 – Extract Deadlock Graphs

Once the Deadlock information has been gathered, the XML files could be extracted from the TRC file: Open TRC file in SQL Profiler and select "File – Export – Extract SQL Server Events – Extract Deadlock Events" (hint: ideally write all DLG into one XDL file).

Step 2 – Analyze XML

The extracted "Deadlock Graphs" are now saved in a XDL file (XML DeadLock). So, by simply renaming it into XML the data could be fed to ANY application or program that could deal with XML – this could be MS Excel in the easiest case! Here we could also learn the "Who is Who about Deadlocks", we could determine the processes involved into deadlocks and the resources (tables, indexes, etc.) which were affected. And we can COUNT the problems!

 Solving and Preventing Deadlocks

In addition to the general advice to prevent/avoid blocks (see above) here the sequence of locking is important, too. A deadlock is the result of a kind of "cross-over-lock", hence if both (all) processes lock the resources in the same sequence a deadlock could not occur. This "sequence" could be defined by C/AL Code – when to lock which records, tables, etc. – or from an organizational point, by avoiding potentially conflicting processes (workflow optimization).

(Remark: I strongly recommend to check out the "NAV Troubleshooting Tools & Guide" for detailed information about how to define and verify locking orders and deadlock potentials)

Finally, an advice you'll find in some documentation from MS: "Avoid Deadlocks by locking a common master-resource". This means, that in the very beginning of a transaction this "master resource" is hard-locked, so all other processes are queued – blocked. Of course this total serialization prevents deadlocks, but actually at cost of any parallelism in the system – and high performance also means to have a high degree of parallelism! This option should be the very last line of defense … NOT recommended !

Conclusion

If Index- and SIFT-Optimization is done, it is primarily a matter of the C/AL Code to avoid blocks and deadlocks. Here even the "standard" NAV code has a pretty high potential of conflicts, which is quite difficult – sometimes impossible – to improve. Due to the way especially posting processes are designed in NAV there will always be blocks, it is just a matter of number of concurrent transactions and users, but sooner or later the limit would be reached …

Those issues which could be fixed could require a lot of "attention", it could be quite time consuming to implement the changes and fixes. To spend this time as efficient as possible it is crucial to precisely know where the blocks/deadlock are coming from – the "Who is Who" – and especially to count how often a problem occurs, to determine which are the worst problems to be fixed first, where the solution promises the highest gain in performance …

At last some advertisement: the solutions from above are taken from the "NAV/SQL Performance Toolbox". Here, just the "light" versions are published, the real tools are more sophisticated, of course!

Puuuuhhh … that was a long one … hope I could give you some valuable inspiration to deal with these matters … I appreciate your feedback, especially about how you're dealing with "Blocks & Deadlocks"!