Copy data to and from Azure SQL Database using Azure Data Factory

Note

This article applies to version 1 of Data Factory. If you are using the current version of the Data Factory service, see Azure SQL Database connector in V2.

This article explains how to use the Copy Activity in Azure Data Factory to move data to and from Azure SQL Database. It builds on the Data Movement Activities article, which presents a general overview of data movement with the copy activity.

Supported scenarios

You can copy data from Azure SQL Database to the following data stores:

Category Data store
Azure Azure Blob storage
Azure Data Lake Storage Gen1
Azure Cosmos DB for NoSQL
Azure SQL Database
Azure Synapse Analytics
Azure Cognitive Search Index
Azure Table storage
Databases SQL Server
Oracle
File File system

You can copy data from the following data stores to Azure SQL Database:

Category Data store
Azure Azure Blob storage
Azure Cosmos DB for NoSQL
Azure Data Lake Storage Gen1
Azure SQL Database
Azure Synapse Analytics
Azure Table storage
Databases Amazon Redshift
DB2
MySQL
Oracle
PostgreSQL
SAP Business Warehouse
SAP HANA
SQL Server
Sybase
Teradata
NoSQL Cassandra
MongoDB
File Amazon S3
File system
FTP
HDFS
SFTP
Others Generic HTTP
Generic OData
Generic ODBC
Salesforce
Web table (table from HTML)

Supported authentication type

Azure SQL Database connector supports basic authentication.

Getting started

You can create a pipeline with a copy activity that moves data to/from an Azure SQL Database by using different tools/APIs.

The easiest way to create a pipeline is to use the Copy Wizard. See Tutorial: Create a pipeline using Copy Wizard for a quick walkthrough on creating a pipeline using the Copy data wizard.

You can also use the following tools to create a pipeline: Visual Studio, Azure PowerShell, Azure Resource Manager template, .NET API, and REST API. See Copy activity tutorial for step-by-step instructions to create a pipeline with a copy activity.

Whether you use the tools or APIs, you perform the following steps to create a pipeline that moves data from a source data store to a sink data store:

  1. Create a data factory. A data factory may contain one or more pipelines.
  2. Create linked services to link input and output data stores to your data factory. For example, if you are copying data from an Azure blob storage to Azure SQL Database, you create two linked services to link your Azure storage account and Azure SQL Database to your data factory. For linked service properties that are specific to Azure SQL Database, see linked service properties section.
  3. Create datasets to represent input and output data for the copy operation. In the example mentioned in the last step, you create a dataset to specify the blob container and folder that contains the input data. And, you create another dataset to specify the SQL table in Azure SQL Database that holds the data copied from the blob storage. For dataset properties that are specific to Azure Data Lake Store, see dataset properties section.
  4. Create a pipeline with a copy activity that takes a dataset as an input and a dataset as an output. In the example mentioned earlier, you use BlobSource as a source and SqlSink as a sink for the copy activity. Similarly, if you are copying from Azure SQL Database to Azure Blob Storage, you use SqlSource and BlobSink in the copy activity. For copy activity properties that are specific to Azure SQL Database, see copy activity properties section. For details on how to use a data store as a source or a sink, click the link in the previous section for your data store.

When you use the wizard, JSON definitions for these Data Factory entities (linked services, datasets, and the pipeline) are automatically created for you. When you use tools/APIs (except .NET API), you define these Data Factory entities by using the JSON format. For samples with JSON definitions for Data Factory entities that are used to copy data to/from an Azure SQL Database, see JSON examples section of this article.

The following sections provide details about JSON properties that are used to define Data Factory entities specific to Azure SQL Database:

Linked service properties

An Azure SQL linked service links Azure SQL Database to your data factory. The following table provides description for JSON elements specific to Azure SQL linked service.

Property Description Required
type The type property must be set to: AzureSqlDatabase Yes
connectionString Specify information needed to connect to the Azure SQL Database instance for the connectionString property. Only basic authentication is supported. Yes

Important

Configure Azure SQL Database Firewall the database server to allow Azure Services to access the server. Additionally, if you are copying data to Azure SQL Database from outside Azure including from on-premises data sources with data factory gateway, configure appropriate IP address range for the machine that is sending data to Azure SQL Database.

Dataset properties

To specify a dataset to represent input or output data in Azure SQL Database, you set the type property of the dataset to: AzureSqlTable. Set the linkedServiceName property of the dataset to the name of the Azure SQL linked service.

For a full list of sections & properties available for defining datasets, see the Creating datasets article. Sections such as structure, availability, and policy of a dataset JSON are similar for all dataset types (Azure SQL, Azure blob, Azure table, etc.).

The typeProperties section is different for each type of dataset and provides information about the location of the data in the data store. The typeProperties section for the dataset of type AzureSqlTable has the following properties:

Property Description Required
tableName Name of the table or view in the Azure SQL Database instance that linked service refers to. Yes

Copy activity properties

For a full list of sections & properties available for defining activities, see the Creating Pipelines article. Properties such as name, description, input and output tables, and policy are available for all types of activities.

Note

The Copy Activity takes only one input and produces only one output.

Whereas, properties available in the typeProperties section of the activity vary with each activity type. For Copy activity, they vary depending on the types of sources and sinks.

If you are moving data from Azure SQL Database, you set the source type in the copy activity to SqlSource. Similarly, if you are moving data to Azure SQL Database, you set the sink type in the copy activity to SqlSink. This section provides a list of properties supported by SqlSource and SqlSink.

SqlSource

In copy activity, when the source is of type SqlSource, the following properties are available in typeProperties section:

Property Description Allowed values Required
sqlReaderQuery Use the custom query to read data. SQL query string. Example: select * from MyTable. No
sqlReaderStoredProcedureName Name of the stored procedure that reads data from the source table. Name of the stored procedure. The last SQL statement must be a SELECT statement in the stored procedure. No
storedProcedureParameters Parameters for the stored procedure. Name/value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters. No

If the sqlReaderQuery is specified for the SqlSource, the Copy Activity runs this query against the Azure SQL Database source to get the data. Alternatively, you can specify a stored procedure by specifying the sqlReaderStoredProcedureName and storedProcedureParameters (if the stored procedure takes parameters).

If you do not specify either sqlReaderQuery or sqlReaderStoredProcedureName, the columns defined in the structure section of the dataset JSON are used to build a query (select column1, column2 from mytable) to run against the Azure SQL Database. If the dataset definition does not have the structure, all columns are selected from the table.

Note

When you use sqlReaderStoredProcedureName, you still need to specify a value for the tableName property in the dataset JSON. There are no validations performed against this table though.

SqlSource example

"source": {
    "type": "SqlSource",
    "sqlReaderStoredProcedureName": "CopyTestSrcStoredProcedureWithParameters",
    "storedProcedureParameters": {
        "stringData": { "value": "str3" },
        "identifier": { "value": "$$Text.Format('{0:yyyy}', SliceStart)", "type": "Int"}
    }
}

The stored procedure definition:

CREATE PROCEDURE CopyTestSrcStoredProcedureWithParameters
(
    @stringData varchar(20),
    @identifier int
)
AS
SET NOCOUNT ON;
BEGIN
    select *
    from dbo.UnitTestSrcTable
    where dbo.UnitTestSrcTable.stringData != stringData
    and dbo.UnitTestSrcTable.identifier != identifier
END
GO

SqlSink

SqlSink supports the following properties:

Property Description Allowed values Required
writeBatchTimeout Wait time for the batch insert operation to complete before it times out. timespan

Example: "00:30:00" (30 minutes).
No
writeBatchSize Inserts data into the SQL table when the buffer size reaches writeBatchSize. Integer (number of rows) No (default: 10000)
sqlWriterCleanupScript Specify a query for Copy Activity to execute such that data of a specific slice is cleaned up. For more information, see repeatable copy. A query statement. No
sliceIdentifierColumnName Specify a column name for Copy Activity to fill with auto generated slice identifier, which is used to clean up data of a specific slice when rerun. For more information, see repeatable copy. Column name of a column with data type of binary(32). No
sqlWriterStoredProcedureName Name of the stored procedure that defines how to apply source data into target table, e.g. to do upserts or transform using your own business logic.

Note this stored procedure will be invoked per batch. If you want to do operation that only runs once and has nothing to do with source data e.g. delete/truncate, use sqlWriterCleanupScript property.
Name of the stored procedure. No
storedProcedureParameters Parameters for the stored procedure. Name/value pairs. Names and casing of parameters must match the names and casing of the stored procedure parameters. No
sqlWriterTableType Specify a table type name to be used in the stored procedure. Copy activity makes the data being moved available in a temp table with this table type. Stored procedure code can then merge the data being copied with existing data. A table type name. No

SqlSink example

"sink": {
    "type": "SqlSink",
    "writeBatchSize": 1000000,
    "writeBatchTimeout": "00:05:00",
    "sqlWriterStoredProcedureName": "CopyTestStoredProcedureWithParameters",
    "sqlWriterTableType": "CopyTestTableType",
    "storedProcedureParameters": {
        "identifier": { "value": "1", "type": "Int" },
        "stringData": { "value": "str1" },
        "decimalData": { "value": "1", "type": "Decimal" }
    }
}

JSON examples for copying data to and from SQL Database

The following examples provide sample JSON definitions that you can use to create a pipeline by using Visual Studio or Azure PowerShell. They show how to copy data to and from Azure SQL Database and Azure Blob Storage. However, data can be copied directly from any of sources to any of the sinks stated here using the Copy Activity in Azure Data Factory.

Example: Copy data from Azure SQL Database to Azure Blob

The same defines the following Data Factory entities:

  1. A linked service of type AzureSqlDatabase.
  2. A linked service of type AzureStorage.
  3. An input dataset of type AzureSqlTable.
  4. An output dataset of type Azure Blob.
  5. A pipeline with a Copy activity that uses SqlSource and BlobSink.

The sample copies time-series data (hourly, daily, etc.) from a table in Azure SQL Database to a blob every hour. The JSON properties used in these samples are described in sections following the samples.

Azure SQL Database linked service:

{
  "name": "AzureSqlLinkedService",
  "properties": {
    "type": "AzureSqlDatabase",
    "typeProperties": {
      "connectionString": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;User ID=<username>@<servername>;Password=<password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
    }
  }
}

See the Azure SQL Linked Service section for the list of properties supported by this linked service.

Azure Blob storage linked service:

{
  "name": "StorageLinkedService",
  "properties": {
    "type": "AzureStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
    }
  }
}

See the Azure Blob article for the list of properties supported by this linked service.

Azure SQL input dataset:

The sample assumes you have created a table "MyTable" in Azure SQL and it contains a column called "timestampcolumn" for time series data.

Setting "external": "true" informs the Azure Data Factory service that the dataset is external to the data factory and is not produced by an activity in the data factory.

{
  "name": "AzureSqlInput",
  "properties": {
    "type": "AzureSqlTable",
    "linkedServiceName": "AzureSqlLinkedService",
    "typeProperties": {
      "tableName": "MyTable"
    },
    "external": true,
    "availability": {
      "frequency": "Hour",
      "interval": 1
    },
    "policy": {
      "externalData": {
        "retryInterval": "00:01:00",
        "retryTimeout": "00:10:00",
        "maximumRetry": 3
      }
    }
  }
}

See the Azure SQL dataset type properties section for the list of properties supported by this dataset type.

Azure Blob output dataset:

Data is written to a new blob every hour (frequency: hour, interval: 1). The folder path for the blob is dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, day, and hours parts of the start time.

{
  "name": "AzureBlobOutput",
  "properties": {
    "type": "AzureBlob",
    "linkedServiceName": "StorageLinkedService",
    "typeProperties": {
      "folderPath": "mycontainer/myfolder/yearno={Year}/monthno={Month}/dayno={Day}/hourno={Hour}/",
      "partitionedBy": [
        {
          "name": "Year",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "yyyy"
          }
        },
        {
          "name": "Month",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "MM"
          }
        },
        {
          "name": "Day",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "dd"
          }
        },
        {
          "name": "Hour",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "HH"
          }
        }
      ],
      "format": {
        "type": "TextFormat",
        "columnDelimiter": "\t",
        "rowDelimiter": "\n"
      }
    },
    "availability": {
      "frequency": "Hour",
      "interval": 1
    }
  }
}

See the Azure Blob dataset type properties section for the list of properties supported by this dataset type.

A copy activity in a pipeline with SQL source and Blob sink:

The pipeline contains a Copy Activity that is configured to use the input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to SqlSource and sink type is set to BlobSink. The SQL query specified for the SqlReaderQuery property selects the data in the past hour to copy.

{
  "name":"SamplePipeline",
  "properties":{
    "start":"2014-06-01T18:00:00",
    "end":"2014-06-01T19:00:00",
    "description":"pipeline for copy activity",
    "activities":[
      {
        "name": "AzureSQLtoBlob",
        "description": "copy activity",
        "type": "Copy",
        "inputs": [
          {
            "name": "AzureSQLInput"
          }
        ],
        "outputs": [
          {
            "name": "AzureBlobOutput"
          }
        ],
        "typeProperties": {
          "source": {
            "type": "SqlSource",
            "SqlReaderQuery": "$$Text.Format('select * from MyTable where timestampcolumn >= \\'{0:yyyy-MM-dd HH:mm}\\' AND timestampcolumn < \\'{1:yyyy-MM-dd HH:mm}\\'', WindowStart, WindowEnd)"
          },
          "sink": {
            "type": "BlobSink"
          }
        },
        "scheduler": {
          "frequency": "Hour",
          "interval": 1
        },
        "policy": {
          "concurrency": 1,
          "executionPriorityOrder": "OldestFirst",
          "retry": 0,
          "timeout": "01:00:00"
        }
      }
    ]
  }
}

In the example, sqlReaderQuery is specified for the SqlSource. The Copy Activity runs this query against the Azure SQL Database source to get the data. Alternatively, you can specify a stored procedure by specifying the sqlReaderStoredProcedureName and storedProcedureParameters (if the stored procedure takes parameters).

If you do not specify either sqlReaderQuery or sqlReaderStoredProcedureName, the columns defined in the structure section of the dataset JSON are used to build a query to run against the Azure SQL Database. For example: select column1, column2 from mytable. If the dataset definition does not have the structure, all columns are selected from the table.

See the Sql Source section and BlobSink for the list of properties supported by SqlSource and BlobSink.

Example: Copy data from Azure Blob to Azure SQL Database

The sample defines the following Data Factory entities:

  1. A linked service of type AzureSqlDatabase.
  2. A linked service of type AzureStorage.
  3. An input dataset of type AzureBlob.
  4. An output dataset of type AzureSqlTable.
  5. A pipeline with Copy activity that uses BlobSource and SqlSink.

The sample copies time-series data (hourly, daily, etc.) from Azure blob to a table in Azure SQL Database every hour. The JSON properties used in these samples are described in sections following the samples.

Azure SQL linked service:

{
  "name": "AzureSqlLinkedService",
  "properties": {
    "type": "AzureSqlDatabase",
    "typeProperties": {
      "connectionString": "Server=tcp:<servername>.database.windows.net,1433;Database=<databasename>;User ID=<username>@<servername>;Password=<password>;Trusted_Connection=False;Encrypt=True;Connection Timeout=30"
    }
  }
}

See the Azure SQL Linked Service section for the list of properties supported by this linked service.

Azure Blob storage linked service:

{
  "name": "StorageLinkedService",
  "properties": {
    "type": "AzureStorage",
    "typeProperties": {
      "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
    }
  }
}

See the Azure Blob article for the list of properties supported by this linked service.

Azure Blob input dataset:

Data is picked up from a new blob every hour (frequency: hour, interval: 1). The folder path and file name for the blob are dynamically evaluated based on the start time of the slice that is being processed. The folder path uses year, month, and day part of the start time and file name uses the hour part of the start time. "external": "true" setting informs the Data Factory service that this table is external to the data factory and is not produced by an activity in the data factory.

{
  "name": "AzureBlobInput",
  "properties": {
    "type": "AzureBlob",
    "linkedServiceName": "StorageLinkedService",
    "typeProperties": {
      "folderPath": "mycontainer/myfolder/yearno={Year}/monthno={Month}/dayno={Day}/",
      "fileName": "{Hour}.csv",
      "partitionedBy": [
        {
          "name": "Year",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "yyyy"
          }
        },
        {
          "name": "Month",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "MM"
          }
        },
        {
          "name": "Day",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "dd"
          }
        },
        {
          "name": "Hour",
          "value": {
            "type": "DateTime",
            "date": "SliceStart",
            "format": "HH"
          }
        }
      ],
      "format": {
        "type": "TextFormat",
        "columnDelimiter": ",",
        "rowDelimiter": "\n"
      }
    },
    "external": true,
    "availability": {
      "frequency": "Hour",
      "interval": 1
    },
    "policy": {
      "externalData": {
        "retryInterval": "00:01:00",
        "retryTimeout": "00:10:00",
        "maximumRetry": 3
      }
    }
  }
}

See the Azure Blob dataset type properties section for the list of properties supported by this dataset type.

Azure SQL Database output dataset:

The sample copies data to a table named "MyTable" in Azure SQL. Create the table in Azure SQL with the same number of columns as you expect the Blob CSV file to contain. New rows are added to the table every hour.

{
  "name": "AzureSqlOutput",
  "properties": {
    "type": "AzureSqlTable",
    "linkedServiceName": "AzureSqlLinkedService",
    "typeProperties": {
      "tableName": "MyOutputTable"
    },
    "availability": {
      "frequency": "Hour",
      "interval": 1
    }
  }
}

See the Azure SQL dataset type properties section for the list of properties supported by this dataset type.

A copy activity in a pipeline with Blob source and SQL sink:

The pipeline contains a Copy Activity that is configured to use the input and output datasets and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to BlobSource and sink type is set to SqlSink.

{
  "name":"SamplePipeline",
  "properties":{
    "start":"2014-06-01T18:00:00",
    "end":"2014-06-01T19:00:00",
    "description":"pipeline with copy activity",
    "activities":[
      {
        "name": "AzureBlobtoSQL",
        "description": "Copy Activity",
        "type": "Copy",
        "inputs": [
          {
            "name": "AzureBlobInput"
          }
        ],
        "outputs": [
          {
            "name": "AzureSqlOutput"
          }
        ],
        "typeProperties": {
          "source": {
            "type": "BlobSource",
            "blobColumnSeparators": ","
          },
          "sink": {
            "type": "SqlSink"
          }
        },
        "scheduler": {
          "frequency": "Hour",
          "interval": 1
        },
        "policy": {
          "concurrency": 1,
          "executionPriorityOrder": "OldestFirst",
          "retry": 0,
          "timeout": "01:00:00"
        }
      }
    ]
  }
}

See the Sql Sink section and BlobSource for the list of properties supported by SqlSink and BlobSource.

Identity columns in the target database

This section provides an example for copying data from a source table without an identity column to a destination table with an identity column.

Source table:

create table dbo.SourceTbl
(
    name varchar(100),
    age int
)

Destination table:

create table dbo.TargetTbl
(
    identifier int identity(1,1),
    name varchar(100),
    age int
)

Notice that the target table has an identity column.

Source dataset JSON definition

{
    "name": "SampleSource",
    "properties": {
        "type": " SqlServerTable",
        "linkedServiceName": "TestIdentitySQL",
        "typeProperties": {
            "tableName": "SourceTbl"
        },
        "availability": {
            "frequency": "Hour",
            "interval": 1
        },
        "external": true,
        "policy": {}
    }
}

Destination dataset JSON definition

{
    "name": "SampleTarget",
    "properties": {
        "structure": [
            { "name": "name" },
            { "name": "age" }
        ],
        "type": "AzureSqlTable",
        "linkedServiceName": "TestIdentitySQLSource",
        "typeProperties": {
            "tableName": "TargetTbl"
        },
        "availability": {
            "frequency": "Hour",
            "interval": 1
        },
        "external": false,
        "policy": {}
    }
}

Notice that as your source and target table have different schema (target has an additional column with identity). In this scenario, you need to specify structure property in the target dataset definition, which doesn't include the identity column.

Invoke stored procedure from SQL sink

For an example of invoking a stored procedure from SQL sink in a copy activity of a pipeline, see Invoke stored procedure for SQL sink in copy activity article.

Type mapping for Azure SQL Database

As mentioned in the data movement activities article Copy activity performs automatic type conversions from source types to sink types with the following 2-step approach:

  1. Convert from native source types to .NET type
  2. Convert from .NET type to native sink type

When moving data to and from Azure SQL Database, the following mappings are used from SQL type to .NET type and vice versa. The mapping is same as the SQL Server Data Type Mapping for ADO.NET.

SQL Server Database Engine type .NET Framework type
bigint Int64
binary Byte[]
bit Boolean
char String, Char[]
date DateTime
Datetime DateTime
datetime2 DateTime
Datetimeoffset DateTimeOffset
Decimal Decimal
FILESTREAM attribute (varbinary(max)) Byte[]
Float Double
image Byte[]
int Int32
money Decimal
nchar String, Char[]
ntext String, Char[]
numeric Decimal
nvarchar String, Char[]
real Single
rowversion Byte[]
smalldatetime DateTime
smallint Int16
smallmoney Decimal
sql_variant Object *
text String, Char[]
time TimeSpan
timestamp Byte[]
tinyint Byte
uniqueidentifier Guid
varbinary Byte[]
varchar String, Char[]
xml Xml

Map source to sink columns

To learn about mapping columns in source dataset to columns in sink dataset, see Mapping dataset columns in Azure Data Factory.

Repeatable copy

When copying data to SQL Server Database, the copy activity appends data to the sink table by default. To perform an UPSERT instead, See Repeatable write to SqlSink article.

When copying data from relational data stores, keep repeatability in mind to avoid unintended outcomes. In Azure Data Factory, you can rerun a slice manually. You can also configure retry policy for a dataset so that a slice is rerun when a failure occurs. When a slice is rerun in either way, you need to make sure that the same data is read no matter how many times a slice is run. See Repeatable read from relational sources.

Performance and Tuning

See Copy Activity Performance & Tuning Guide to learn about key factors that impact performance of data movement (Copy Activity) in Azure Data Factory and various ways to optimize it.