How to run containers in Azure Data Factory !
In this article, we will cover all the steps of building a data factory pipeline and running the pipeline activities in a container.
👉 Learning Goals of this article.
- Identify the possible approach to run containers via Azure Datafactory
- Implement a cost-effective solution to orchestrate the containers using Data Factory in true serverless fashion
Let’s get started !
Before diving into the details, let’s present Azure Data factory and list the advantages of using container based solution.
Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. It’s an orchestrating tool like Airflow or Kubeflow.
A Data Factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. The activities in a pipeline define actions to perform on your data. For example, you might use a copy activity to copy data from SQL Server to Azure Blob storage. Then, you might use a Hive activity that runs a Hive script on an Azure HDInsight cluster to process data from Blob storage to produce output data.