How to run containers in Azure Data Factory !

Mohamed Dhaoui
8 min readNov 22, 2020

In this article, we will cover all the steps of building a data factory pipeline and running the pipeline activities in a container.

👉 Learning Goals of this article.

  • Identify the possible approach to run containers via Azure Datafactory
  • Implement a cost-effective solution to orchestrate the containers using Data Factory in true serverless fashion

Let’s get started !

Before diving into the details, let’s present Azure Data factory and list the advantages of using container based solution.

Azure Data Factory is a cloud-based data integration service that allows you to create data-driven workflows in the cloud for orchestrating and automating data movement and data transformation. It’s an orchestrating tool like Airflow or Kubeflow.

A Data Factory can have one or more pipelines. A pipeline is a logical grouping of activities that together perform a task. The activities in a pipeline define actions to perform on your data. For example, you might use a copy activity to copy data from SQL Server to Azure Blob storage. Then, you might use a Hive activity that runs a Hive script on an Azure HDInsight cluster to process data from Blob storage to produce output data.

--

--

Mohamed Dhaoui

Lead Data engineer and Data science practitioner ! Interested in data science and software development topics. GCP 5x certified and Go fan.