High speed, multithreaded CRUD with 'Lookup' acitivity in Azure Data Factory

Subhad365

5 User Group Leader

Like ( 0 )

Report

What is Azure Data Factory (ADF)? Features and Applications

Wassup guys?

If you have voluminous data to be updated/inserted/deleted and you are unsure which Azure component you can leverage, then look no further. Azure Data factory has a wonderful capability of 'Lookup' activities on its data pipelines, which can help you in multi-thread and feed into your target system, without much of a difficulty. It's fast and not to mention, completely code-less.

Lemme share the steps. But before that: What is a lookup activity?

Lookup activity can retrieve a dataset from any of the data sources supported by data factory and Synapse pipelines. You can use it to dynamically determine which objects to operate on in a subsequent activity, instead of hard coding the object name. Some object examples are files and tables.

Lookup activity reads and returns the content of a configuration file or table. It also returns the result of executing a query or stored procedure. The output can be a singleton value or an array of attributes, which can be consumed in a subsequent copy, transformation, or control flow activities like ForEach activity.

Uh, I borrowed it from Microsoft. Lookups could be used to scoop out data from a dataset and then you can use it to manipulate it for further operations.

For example, I have an AZURE SQL server table called CustBalance, which looks like this:

And I already have over 50k records in it. I need to update the records and update the 'ModifiedDateTime' and 'Updated' columns to necessary values, once correctly updated.

I have created a very simple stored proc at my Azure SQL that could that:

create PROCEDURE sp_updateCustBalanceWithDates
(
@custAccount nvarchar(6)
)
AS
BEGIN

SET NOCOUNT ON
update CustBalance set Updated = 1, ModifiedDateTime = GETDATE() where CustomerCode = @custAccount
END
GO

Which means, it will accept the customer account as input and if a matching record is found, it will update the record with current date time.

Let us go back to our ADF and create a new pipeline:

Let us call this pipeline: 'PipelineUpdateRecords'.

Let us add a 'Lookup' activity. I am calling it as LookupBalance. Come down to the settings and choose the necessary source settings:

By retry: it means how many times shall it retry. And additionally, you can also specify the Retry intervals. Come to Settings >> you can create a new Linked service or choose from an existing one to use it here as a source:

Where: AzureSQLTable2 is my current Azure SQL table linked service. A click on my preview data would give:

Now coming back to our pipeline, we are gonna add 'Foreach' activity and add output from 'Lookup' node. I am giving name of this new For each node as 'ForEachLoop':

And then in the settings:

I am giving batch count as = 4. You can obviously increase the number of threads to many-many folds, depending on your need:

Also click on the Items and open the Expression Builder:

Select the highlighted option as Output from Lookup as an array, and it will fill in the expression builder automatically. Next, let us jump into 'adding an activity' in our 'For each loop' and select 'Stored procedure':

Once again I need to configure the Stored procedure inputs. Come under settings:

Select the necessary Linked Service and select the stored procedure from there on. Don't forget to click on 'Import' button >> it will immediately populate the input parameter for the Stored Procedure.

Next is very simple: click on the value text box>> it will open the expression builder:

Type in the field CustomerCode, which is the Primary index for our record.

Done. Validate all and publish your pipeline. And then trigger it:

Wait for few moments, to get the execution to be over. And guess what? It took only 35 seconds to complete the execution:

And then opening the data looks like:

Yes: it's a terribly fast mode of execution when you are planning to update a big volume of data, pretty quickly.

So much for now, will be coming back soon with such cool hacks on data manipulation using various tentacles of Azure. Much love and Namaste....

Comments

*This post is locked for comments

High speed, multithreaded CRUD with 'Lookup' acitivity in Azure Data Factory

Comments

Community Spotlight of the Month

Blog subscriptions now enabled!

TechTalk: How Dataverse and Microsoft Fabric powers ...

more Community News