First please keep in mind that I know almost nothing about Exchange or how it works. The parts about Exchange that I will be discussing I am just paraphrasing from my Exchange Server Admin.

Here is what happened:

For years we have been using Exchange 2003 with no problems. And for years I have had SQL Server send out emails using the sp_send_dbmail procedure in MSDB via the Exchange Server with no problems.

About two weeks ago we tested and then upgraded to Exchange 2010. According to all of my SQL logging and Histories all of my emails were working just fine.

Now jump ahead two weeks. Users start informing me that they are not getting all of their emails that they should. And after some research I discovered that they were in fact correct. Although there were no errors in any of my SQL scripts or the logs that they create that I could initially find the emails were simply not getting to the users.

Then I found an error in the event log that stated "Cannot send mails to mail server. (The operation has timed out.)

My first thought was that their was something wrong with the new mail server. I have not touched the SQL servers in months.

Ok, not that I have an error message let's start digging. And Digging. And Digging.

Ok, I finally found out that it is in fact not the Exchange server per say it is actually the SQL server. Not only that but the problem was fixed back in one of the service packs for SQL 2005. See KB968834 for details.

To make a long story short here is what's going on. With the older version of Exchange the SQL server talked directly with the mail server. And as the mail server trusted the SQL server the emails went out. However, with the new version of Exchange there is a another server called the "Client Access Server"(CAS) that the SQL server actually talks to. The CAS server validates the emails against the Domain Controller (DC) and if they are ok then sends the email on to the actual mail server.

However, as you can guess it does take some amount of time for the CAS server to validate the accounts. And in the KB article mentioned above the "timeout" field that is created in the sysmail_server table (at least on all of my SQL Servers) had a default of "NULL". And I am guessing that means do not wait at all seeing as when one of my procedures that send out around 2000 emails only between 20 to 100 of them actually got to the users.

But the great news is that is easily fixed by using the sysmail_update_sp stored procedure to set an actual value in this field.

Here is what to do:

First execute this "select * from msdb.dbo.sysmail_server"

Look for the "account_id" field that you want to change, and remember the value.

Then execute msdb.dbo.sysmail_update_account_sp @Account_id = <the number you got from the previous statement>, @Timeout= <some number , I used 30 myself>, @No_credential_change = 1 (do not change anything else)

And that you are finished. Everything now works fine.

I probably could have use a smaller number but this works great. In fact I have noticed that since this change not only do all of the emails get to the people that they are going to but they are actually getting there faster than ever before.

Note: this timeout is not to be confused with the time setting on how long SQL server should wait between retries, which can be set via the SSMS GUI. As far as I know the only way to set this value is via T-SQL and the aforementioned stored procedure.

 

Till later,