Azure Data Factory supports the following transformation activities that can be added to pipelines either individually or chained with another activity.
Data transformation activity | Compute environment |
Hive | HDInsight [Hadoop] |
Pig | HDInsight [Hadoop] |
MapReduce | HDInsight [Hadoop] |
Hadoop Streaming | HDInsight [Hadoop] |
Spark | HDInsight [Hadoop] |
Machine Learning activities: Batch Execution and Update Resource | Azure VM |
Stored Procedure | Azure SQL, Azure SQL Data Warehouse, or SQL Server |
Data Lake Analytics U-SQL | Azure Data Lake Analytics |
DotNet | HDInsight [Hadoop] or Azure Batch |
You can use MapReduce activity to run Spark programs on your HDInsight Spark cluster. See Invoke Spark programs from Azure Data Factory for details. You can create a custom activity to run R scripts on your HDInsight cluster with R installed. See Run R Script using Azure Data Factory.