This is an addendum to the Getting Data from Point A to Point B series.
Wow, has it been that long since I first thought about writing this post? I wonder if work as many hours a week as I do has anything to do with that. J
Anyway, as I promised longer ago that I care to admit, in this post we will look at the differences between doing a single insert with a union statement versus just doing multiple insert statements.
If you remember this post, I was trying to get information out of a single table from four different company databases, and put that information into a single table in a single "Stage 1" database table in a single database. Then I could mix and match that data with other data from other sources. So, take a moment to go back and refresh your memory on what I was doing in that post so that the rest of this one makes more sense.
Ok, now that you are back, let us look at that code again, shall we.
insert into DYN.dbo.upr00100 select 'Jackson ' as Corp, * from gpjac.dbo.upr00100 as upr00100 with (nolock) union select 'Carthage ' as Corp, * from gpcar.dbo.upr00100 as upr00100 with (nolock) union select 'Greenwood' as Corp, * from gpgre.dbo.upr00100 as upr00100 with (nolock) union select 'Hospice ' as Corp, * from hmain.dbo.upr00100 as upr00100 with (nolock)
Now, this code works just fine. But is there another way to achieve the same result? In addition, would that other way be better? Let us look at this code and see what we get.
insert into DYN.dbo.upr00100 select 'Jackson ' as Corp, * from gpjac.dbo.upr00100 as upr00100 with (nolock)
insert into DYN.dbo.upr00100 select 'Carthage ' as Corp, * from gpcar.dbo.upr00100 as upr00100 with (nolock)
insert into DYN.dbo.upr00100 select 'Greenwood' as Corp, * from gpgre.dbo.upr00100 as upr00100 with (nolock)
insert into DYN.dbo.upr00100 select 'Hospice ' as Corp, * from hmain.dbo.upr00100 as upr00100 with (nolock)
Notice that in both sets of code I am using the "with (nolock)" while this does have the possibility of returning "dirty" results, it helps to lessen the possibility that some other process that could have a lock on the table from affecting the results. Normally I do not us this in production code but I have to admit that I use it a lot when testing. As long as both sets or code being tested are using the "with (nolock)" it should not change the results of the test. I should also point out that the destination table in this example has no triggers, restraints, indexes, anything. It just a heap of data, so to speak.
So, when you try out these two sets of code, first do not use the first execution of either one as a test reference. I always through the first one out because of the SQL Servers engine to buffer things very well, and the first time will always (at least in my experience always takes longer). We can dig into that, later or just go bing it. :)
When I first looked at the two scripts, I figured that they should run in about the same amount of time. Both are pulling data out of four tables and inserting that data into a single table. Should be the same, right? Boy was I wrong.
If I looked at "Wait time on server replies" under "Client Statitics" for the first script (the one that uses the union) I got an average of around 233.0000. However, the second script without the union had an average of around 62.3333! WOW! talk about improvement!
So where does that leave us? Is the UNION in SQL always bad. No, of course not. Sometimes using a UNION is actually better, however, like just about everything else in the world, there is no single best way of doing something. As someone that I really look up to in the SQL world always says, "It depends.".
So the moral of this story is to be open to try different things. Just because you have always done it one way, does not always mean that it is the best way in a specific case.