Move data from Salesforce by using Azure Data Factory

Note

This article applies to version 1 of Data Factory. If you are using the current version of the Data Factory service, see Salesforce connector in V2.

This article outlines how you can use Copy Activity in an Azure data factory to copy data from Salesforce to any data store that is listed under the Sink column in the supported sources and sinks table. This article builds on the data movement activities article, which presents a general overview of data movement with Copy Activity and supported data store combinations.

Azure Data Factory currently supports only moving data from Salesforce to supported sink data stores, but does not support moving data from other data stores to Salesforce.

Supported versions

This connector supports the following editions of Salesforce: Developer Edition, Professional Edition, Enterprise Edition, or Unlimited Edition. And it supports copying from Salesforce production, sandbox and custom domain.

Prerequisites

  • API permission must be enabled.
  • To copy data from Salesforce to on-premises data stores, you must have at least Data Management Gateway 2.0 installed in your on-premises environment.

Salesforce request limits

Salesforce has limits for both total API requests and concurrent API requests. Note the following points:

  • If the number of concurrent requests exceeds the limit, throttling occurs and you will see random failures.
  • If the total number of requests exceeds the limit, the Salesforce account will be blocked for 24 hours.

You might also receive the "REQUEST_LIMIT_EXCEEDED" error in both scenarios. See the "API Request Limits" section in the Salesforce Developer Limits article for details.

Getting started

You can create a pipeline with a copy activity that moves data from Salesforce by using different tools/APIs.

The easiest way to create a pipeline is to use the Copy Wizard. See Tutorial: Create a pipeline using Copy Wizard for a quick walkthrough on creating a pipeline using the Copy data wizard.

You can also use the following tools to create a pipeline: Visual Studio, Azure PowerShell, Azure Resource Manager template, .NET API, and REST API. See Copy activity tutorial for step-by-step instructions to create a pipeline with a copy activity.

Whether you use the tools or APIs, you perform the following steps to create a pipeline that moves data from a source data store to a sink data store:

  1. Create linked services to link input and output data stores to your data factory.
  2. Create datasets to represent input and output data for the copy operation.
  3. Create a pipeline with a copy activity that takes a dataset as an input and a dataset as an output.

When you use the wizard, JSON definitions for these Data Factory entities (linked services, datasets, and the pipeline) are automatically created for you. When you use tools/APIs (except .NET API), you define these Data Factory entities by using the JSON format. For a sample with JSON definitions for Data Factory entities that are used to copy data from Salesforce, see JSON example: Copy data from Salesforce to Azure Blob section of this article.

The following sections provide details about JSON properties that are used to define Data Factory entities specific to Salesforce:

Linked service properties

The following table provides descriptions for JSON elements that are specific to the Salesforce linked service.

Property Description Required
type The type property must be set to: Salesforce. Yes
environmentUrl Specify the URL of Salesforce instance.

- Default is "https://login.salesforce.com".
- To copy data from sandbox, specify "https://test.salesforce.com".
- To copy data from custom domain, specify, for example, "https://[domain].my.salesforce.com".
No
username Specify a user name for the user account. Yes
password Specify a password for the user account. Yes
securityToken Specify a security token for the user account. See Get security token for instructions on how to reset/get a security token. To learn about security tokens in general, see Security and the API. Yes

Dataset properties

For a full list of sections and properties that are available for defining datasets, see the Creating datasets article. Sections such as structure, availability, and policy of a dataset JSON are similar for all dataset types (Azure SQL, Azure blob, Azure table, and so on).

The typeProperties section is different for each type of dataset and provides information about the location of the data in the data store. The typeProperties section for a dataset of the type RelationalTable has the following properties:

Property Description Required
tableName Name of the table in Salesforce. No (if a query of RelationalSource is specified)

Important

The "__c" part of the API Name is needed for any custom object.

Screenshot shows the Custom Object Definition Detail where you can see the A P I names of the custom objects.

Copy activity properties

For a full list of sections and properties that are available for defining activities, see the Creating pipelines article. Properties like name, description, input and output tables, and various policies are available for all types of activities.

The properties that are available in the typeProperties section of the activity, on the other hand, vary with each activity type. For Copy Activity, they vary depending on the types of sources and sinks.

In copy activity, when the source is of the type RelationalSource (which includes Salesforce), the following properties are available in typeProperties section:

Property Description Allowed values Required
query Use the custom query to read data. A SQL-92 query or Salesforce Object Query Language (SOQL) query. For example: select * from MyTable__c. No (if the tableName of the dataset is specified)

Important

The "__c" part of the API Name is needed for any custom object.

Screenshot shows the Custom Fields & Relationships where you can see the A P I names of the custom objects.

Query tips

Retrieving data using where clause on DateTime column

When specify the SOQL or SQL query, pay attention to the DateTime format difference. For example:

  • SOQL sample: $$Text.Format('SELECT Id, Name, BillingCity FROM Account WHERE LastModifiedDate >= {0:yyyy-MM-ddTHH:mm:ssZ} AND LastModifiedDate < {1:yyyy-MM-ddTHH:mm:ssZ}', WindowStart, WindowEnd)
  • SQL sample:
    • Using copy wizard to specify the query: $$Text.Format('SELECT * FROM Account WHERE LastModifiedDate >= {{ts\'{0:yyyy-MM-dd HH:mm:ss}\'}} AND LastModifiedDate < {{ts\'{1:yyyy-MM-dd HH:mm:ss}\'}}', WindowStart, WindowEnd)
    • Using JSON editing to specify the query (escape char properly): $$Text.Format('SELECT * FROM Account WHERE LastModifiedDate >= {{ts\\'{0:yyyy-MM-dd HH:mm:ss}\\'}} AND LastModifiedDate < {{ts\\'{1:yyyy-MM-dd HH:mm:ss}\\'}}', WindowStart, WindowEnd)

Retrieving data from Salesforce Report

You can retrieve data from Salesforce reports by specifying query as {call "<report name>"},for example,. "query": "{call \"TestReport\"}".

Retrieving deleted records from Salesforce Recycle Bin

To query the soft deleted records from Salesforce Recycle Bin, you can specify "IsDeleted = 1" in your query. For example,

  • To query only the deleted records, specify "select * from MyTable__c where IsDeleted= 1"
  • To query all the records including the existing and the deleted, specify "select * from MyTable__c where IsDeleted = 0 or IsDeleted = 1"

JSON example: Copy data from Salesforce to Azure Blob

The following example provides sample JSON definitions that you can use to create a pipeline by using the Visual Studio or Azure PowerShell. They show how to copy data from Salesforce to Azure Blob Storage. However, data can be copied to any of the sinks stated here using the Copy Activity in Azure Data Factory.

Here are the Data Factory artifacts that you'll need to create to implement the scenario. The sections that follow the list provide details about these steps.

Salesforce linked service

This example uses the Salesforce linked service. See the Salesforce linked service section for the properties that are supported by this linked service. See Get security token for instructions on how to reset/get the security token.

{
    "name": "SalesforceLinkedService",
    "properties":
    {
        "type": "Salesforce",
        "typeProperties":
        {
            "username": "<user name>",
            "password": "<password>",
            "securityToken": "<security token>"
        }
    }
}

Azure Storage linked service

{
    "name": "AzureStorageLinkedService",
    "properties": {
        "type": "AzureStorage",
        "typeProperties": {
            "connectionString": "DefaultEndpointsProtocol=https;AccountName=<accountname>;AccountKey=<accountkey>"
        }
    }
}

Salesforce input dataset

{
    "name": "SalesforceInput",
    "properties": {
        "linkedServiceName": "SalesforceLinkedService",
        "type": "RelationalTable",
        "typeProperties": {
            "tableName": "AllDataType__c"
        },
        "availability": {
            "frequency": "Hour",
            "interval": 1
        },
        "external": true,
        "policy": {
            "externalData": {
                "retryInterval": "00:01:00",
                "retryTimeout": "00:10:00",
                "maximumRetry": 3
            }
        }
    }
}

Setting external to true informs the Data Factory service that the dataset is external to the data factory and is not produced by an activity in the data factory.

Important

The "__c" part of the API Name is needed for any custom object.

Screenshot shows the Custom Object Definition Detail where you can see Singular Label, Plural Label, Object Name, and A P I Name.

Azure blob output dataset

Data is written to a new blob every hour (frequency: hour, interval: 1).

{
    "name": "AzureBlobOutput",
    "properties":
    {
        "type": "AzureBlob",
        "linkedServiceName": "AzureStorageLinkedService",
        "typeProperties":
        {
            "folderPath": "adfgetstarted/alltypes_c"
        },
        "availability":
        {
            "frequency": "Hour",
            "interval": 1
        }
    }
}

Pipeline with Copy Activity

The pipeline contains Copy Activity, which is configured to use the input and output datasets, and is scheduled to run every hour. In the pipeline JSON definition, the source type is set to RelationalSource, and the sink type is set to BlobSink.

See RelationalSource type properties for the list of properties that are supported by the RelationalSource.

{
    "name":"SamplePipeline",
    "properties":{
        "start":"2016-06-01T18:00:00",
        "end":"2016-06-01T19:00:00",
        "description":"pipeline with copy activity",
        "activities":[
        {
            "name": "SalesforceToAzureBlob",
            "description": "Copy from Salesforce to an Azure blob",
            "type": "Copy",
            "inputs": [
            {
                "name": "SalesforceInput"
            }
            ],
            "outputs": [
            {
                "name": "AzureBlobOutput"
            }
            ],
            "typeProperties": {
                "source": {
                    "type": "RelationalSource",
                    "query": "SELECT Id, Col_AutoNumber__c, Col_Checkbox__c, Col_Currency__c, Col_Date__c, Col_DateTime__c, Col_Email__c, Col_Number__c, Col_Percent__c, Col_Phone__c, Col_Picklist__c, Col_Picklist_MultiSelect__c, Col_Text__c, Col_Text_Area__c, Col_Text_AreaLong__c, Col_Text_AreaRich__c, Col_URL__c, Col_Text_Encrypt__c, Col_Lookup__c FROM AllDataType__c"
                },
                "sink": {
                    "type": "BlobSink"
                }
            },
            "scheduler": {
                "frequency": "Hour",
                "interval": 1
            },
            "policy": {
                "concurrency": 1,
                "executionPriorityOrder": "OldestFirst",
                "retry": 0,
                "timeout": "01:00:00"
            }
        }
        ]
    }
}

Important

The "__c" part of the API Name is needed for any custom object.

Screenshot shows the Custom Fields & Relationships with the A P I names called out.

Type mapping for Salesforce

Salesforce type .NET-based type
Auto Number String
Checkbox Boolean
Currency Decimal
Date DateTime
Date/Time DateTime
Email String
Id String
Lookup Relationship String
Multi-Select Picklist String
Number Decimal
Percent Decimal
Phone String
Picklist String
Text String
Text Area String
Text Area (Long) String
Text Area (Rich) String
Text (Encrypted) String
URL String

Note

To map columns from source dataset to columns from sink dataset, see Mapping dataset columns in Azure Data Factory.

Specifying structure definition for rectangular datasets

The structure section in the datasets JSON is an optional section for rectangular tables (with rows & columns) and contains a collection of columns for the table. You will use the structure section for either providing type information for type conversions or doing column mappings. The following sections describe these features in detail.

Each column contains the following properties:

Property Description Required
name Name of the column. Yes
type Data type of the column. See type conversions section below for more details regarding when should you specify type information No
culture .NET based culture to be used when type is specified and is .NET type Datetime or Datetimeoffset. Default is “en-us”. No
format Format string to be used when type is specified and is .NET type Datetime or Datetimeoffset. No

The following sample shows the structure section JSON for a table that has three columns userid, name, and lastlogindate.

"structure": 
[
    { "name": "userid"},
    { "name": "name"},
    { "name": "lastlogindate"}
],

Please use the following guidelines for when to include “structure” information and what to include in the structure section.

  • For structured data sources that store data schema and type information along with the data itself (sources like SQL Server, Oracle, Azure table etc.), you should specify the “structure” section only if you want do column mapping of specific source columns to specific columns in sink and their names are not the same (see details in column mapping section below).

    As mentioned above, the type information is optional in “structure” section. For structured sources, type information is already available as part of dataset definition in the data store, so you should not include type information when you do include the “structure” section.

  • For schema on read data sources (specifically Azure blob) you can choose to store data without storing any schema or type information with the data. For these types of data sources you should include “structure” in the following 2 cases:

    • You want to do column mapping.
    • When the dataset is a source in a Copy activity, you can provide type information in “structure” and data factory will use this type information for conversion to native types for the sink. See Move data to and from Azure Blob article for more information.

Supported .NET-based types

Data factory supports the following CLS compliant .NET based type values for providing type information in “structure” for schema on read data sources like Azure blob.

  • Int16
  • Int32
  • Int64
  • Single
  • Double
  • Decimal
  • Byte[]
  • Bool
  • String
  • Guid
  • Datetime
  • Datetimeoffset
  • Timespan

For Datetime & Datetimeoffset you can also optionally specify “culture” & “format” string to facilitate parsing of your custom Datetime string. See sample for type conversion below.

Performance and tuning

See the Copy Activity performance and tuning guide to learn about key factors that impact performance of data movement (Copy Activity) in Azure Data Factory and various ways to optimize it.