Often we need to use update statement in our relation database coding. If it is a small table containing several thousand or may be several hundred thousand rows, we may not face any problem with the time required for the update statement. But if it is a large table containing several million data and we try to update it, we will surely face a performance problem. To update a column of several million rows will take a long time. In such cases we have to think some alternative to reduce the time of the update operation. Sometimes, we may not be able to reduce the time of the update operation, but we can use some alternative approach which will give us the same result as we would get from our update operation. But of course, taking a much less time that the update operation. First we will see a typical case. We have several tables as below:
IMPORT_UK_IMS_SALES_24M (22.6 million)
UK_RXI_GEO_DIM_NF (1223) UK_RXI_PRODUCT_DIM_NF(181) UK_RXI_PRACTICE_DIM_NF(22 thousand) UK_RXI_PERIOD_DIM(24) What we need is to have some of the columns in dbo.IMPORT_UK_IMS_SALES_24M table updated with columns from the other 4 tables. PCO_ID column will be updated with GEO_CODE from UK_RXI_GEO_DIM_NF table matching PCO_ID = GEO_DESC Product_Pack_ID column will be updated with PROD_CODE from UK_RXI_PRODUCT_DIM_NF table matching Product_Pack_ID = PRODUCT_DESC Binleys_Practice_ID column will be updated with PRACTICE_CODE from UK_RXI_PRACTICE_DIM_NF table matching Binleys_Practice_ID = PRAC_DESC YearMonth column will be updated with PERIOD_CODE from UK_RXI_PERIOD_DIM table matching YearMonth = actual_Code
So a very obvious implementation will be as follows:
----------- update geo level ---------
update dbo.IMPORT_UK_IMS_SALES_24M
set PCO_ID = A.GEO_code
from UK_RXI_GEO_DIM_NF as A
where PCO_ID = A.GEO_desc
------------ update product/brand level ---------
update dbo.IMPORT_UK_IMS_SALES_24M
set Product_Pack_ID = B.prod_code
from UK_RXI_PRODUCT_DIM_NF as B
where Product_Pack_ID = B.product_desc
--------- update PRACTICE level-----------------
update dbo.IMPORT_UK_IMS_SALES_24M
set Binleys_Practice_ID = C.PRACTICE_CODE
from UK_RXI_PRACTICE_DIM_NF as C
where Binleys_Practice_ID = C.PRAC_DESC
------------- update period ------------------
update dbo.IMPORT_UK_IMS_SALES_24M
set YearMonth = D.PERIOD_CODE
from UK_RXI_PERIOD_DIM as D
where YearMonth = D.actual_CODE But a single update in this case takes around 8:37 minutes. All the 4 update statements take around 30 minutes. This is obviously not acceptable. A second alternative: The second alternative to this typical update statements will be to combine all the 4 updates into 1 update statement. We will have to use join with update statements.
update IMPORT_UK_IMS_SALES_24M
set IMPORT_UK_IMS_SALES_24M.PCO_ID = A.GEO_code,
IMPORT_UK_IMS_SALES_24M.Product_Pack_ID = B.prod_code,
IMPORT_UK_IMS_SALES_24M.Binleys_Practice_ID = C.PRACTICE_CODE,
IMPORT_UK_IMS_SALES_24M.YearMonth = D.PERIOD_CODE
from IMPORT_UK_IMS_SALES_24M as IM
leftjoin UK_RXI_GEO_DIM_NF as A
on IM.PCO_ID = A.GEO_desc
leftjoin UK_RXI_PRODUCT_DIM_NF as B
on IM.Product_Pack_ID = B.product_desc
leftjoin UK_RXI_PRACTICE_DIM_NF as C
on IM.Binleys_Practice_ID = C.PRAC_DESC
leftjoin UK_RXI_PERIOD_DIM as D
on IM.YearMonth = D.actual_CODE
Typically we think that 4 joins with the large table will take a long time. But although it has 4 joins, it will take the time of a single update statement. To optimize the time, we have used left joins instead of inner joins. The above update statement takes around 8 minutes. So definitely we have reduced the time to one forth. But still 8 minutes is too long. It would better if we could reduce the time even more. A Third alternative: Probably The Best The third and probably the best alternative is to avoid update statement on this large table. How can we avoid this update but still can get the same result? We can create a recordset having the values that would be the value after the update operation. We can use left joins in the same way as we have used in the second alternative. But this time we will not update the table. We will create a full record set and insert it in a temporary table. After that we can truncate table main large table and then reinsert the full record set from the temporary tbale to our main table. Although this process contains 2 very large insertion, but still it is faster than update statement.
select Per.PERIOD_CODE [YearMonth],
P.prod_code [Product_Pack_ID],
G.geo_code [PCO_ID] ,
Prac.PRACTICE_CODE [Binleys_Practice_ID],
A.Sales_value,
A.Rx_value,
A.Count_Units
into #temp
from IMPORT_UK_IMS_SALES_24M A
leftjoin UK_RXI_GEO_DIM_NF G
on A.PCO_ID = G.GEO_desc
leftjoin UK_RXI_PRODUCT_DIM_NF P
on A.Product_Pack_ID = P.product_desc
leftjoin UK_RXI_PRACTICE_DIM_NF Prac
on A.Binleys_Practice_ID = Prac.PRAC_DESC
leftjoin UK_RXI_PERIOD_DIM Per
on A.YearMonth = Per.actual_CODE
truncatetable IMPORT_UK_IMS_SALES_24M
insertinto IMPORT_UK_IMS_SALES_24M
select*
from #temp Doing this will give us the same result in the IMPORT_UK_IMS_SALES_24M table and we need around 2 minutes to run this process. So we are again reducing the time to one forth. So, to conclude, we can say that if we have to do many updates in a very large table (table with several million rows), we can avoid update statement and use select and insert.