Azure Data Factory for Azure SQL DB to Azure File Storage Pipeline

Views (14)

www.RahulSharma.in

508

Like(0)

Report

In this post we will use Azure Data Factory (ADF) platform to move data from Azure SQL DB to Azure File Storage. We will work on the same cloud integration example we did with Logic Apps, but this time with the specialized data integration service.

Read: Logic App for Azure SQL DB to Azure File Storage Workflow

Overview of Azure Data Factory

Azure Data Factory is a cloud-based ETL and data integration service that allows you to create data-driven workflows (pipelines) for orchestrating data movement and transforming data at scale.

No Code

ADF provides a visual environment to easily construct ETL (extract-transform-load) and ELT (extract-load-transform) workflows or pipelines to move data using OOB data connectors.

Ingest, move, prepare, transform, and process your data in a few clicks, and complete your data modeling within the accessible visual environment. This serverless service does all the heavy lifting for you. You as data integrator, just need to focus on the data.

Don’t worry, if needed you can also write custom logic or call Azure Functions / Logic Apps / APIs etc.

Concept & Components

ADF is composed of following key components. You can use these components to build data-driven workflows with steps to transform and move data.

Data factory location

You need to specify the location for the data factory at the time of its creation. The Data Factory location is where the metadata of the data factory is stored and where the triggering of the pipeline is initiated from.

Pipelines

You create data factory workflow using pipelines that define the movement or transformation of data from one location to another. A pipeline is a logical grouping of activities that performs a unit of work. Together, the activities in a pipeline perform a task.

An activity defines the processing step in the pipeline or the action to be performed. Activities can be of three types: data movement activities, data transformation activities, and control activities.

When you trigger the execution of a pipeline, it creates a pipeline instance represented by a pipeline run.

Data flows

Data flows are used to create data transformation logic to transform any size of data.

Linked services

A connection to the data source or a compute resource. Linked services are your connection strings, which define the connection information that’s needed for Data Factory to connect to external resources.

Datasets

Datasets represents the data structures within a data source. You use datasets in your activities as inputs or outputs for the data you want to work on.

Integration runtime

Azure integration runtime provides a fully managed, serverless compute to move data between cloud data stores in a secure, reliable, and high-performance manner. It acts like a bridge between the activity and linked Services. It’s referenced by the linked service or activity, and provides the compute environment where the activity either runs on or gets dispatched from.

SSIS Integration Runtime can be easily used to move and run on-premise SSIS packages in the cloud using familiar SSIS tools.

The IR Location defines the location of its back-end compute, and essentially the location where the data movement, activity dispatching, and SSIS package execution are performed. The IR location can be different from the location of the data factory it belongs to.

Create Azure Data Factory pipeline to move data from Azure SQL DB to Azure File Share

Now as we have some basic knowledge about Data factory, we can get started with our data integration project.

In this solution, we will create a Data Factory with a pipeline that will orchestrate the movement of data from an Azure SQL DB to an Azure File Share.

You can also refer the Logic Apps project for the same data integration scenario, here: Logic App for Azure SQL DB to Azure File Storage Workflow

Create Data Factory

Go to Azure Portal and search for Data factory to create a new data factory instance. Select resource group, unique name, and factory location where the metadata will be stored.

Once it is deployed, go to ADF portal from the “Author & Monitor” link to create pipeline and related components.

Create Linked Services

We will create two linked services to hold connection info for our source Azure SQL DB and target Azure File Share.

Click on Manage > New to add linked services.

Select the data stores (Azure SQL / Azure File Storage) and provide appropriate connection details.

Create Datasets

Now create two datasets, one each for our linked services. These datasets will represent the data structure of our source and targets.

Click on Author pencil icon and then Datasets.

For file dataset, also specify the file format and location.

Create Pipeline

We can now create the workflow with required activities to complete the data movement or transformation task.

Add Activity

Add Copy data activity to move the data from source to sink (target). On the source tab, select the SQL dataset we created and call the table/view/stored procedure to fetch the data.