Data Factory

Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.

Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.

How can we improve Microsoft Azure Data Factory?

You've used all your votes and won't be able to post a new idea, but you can still search and comment on existing ideas.

There are two ways to get more votes:

When an admin closes an idea you've voted on, you'll get your votes back from that idea.
You can remove your votes from an open idea you support.
To see ideas you have already voted on, select the "My feedback" filter and select "My open ideas".

Enter your idea

(thinking…)

Enter your idea and we'll search to see if someone has already suggested it.

If a similar idea already exists, you can support and comment on it.

If it doesn't exist, you can post your idea so others can support it.

Enter your idea and we'll search to see if someone has already suggested it.

Schedule pipelines as jobs / run on pipelines on demand

Rather than the time slice idea, allow us to schedule pipelines as jobs, the same way I would schedule an agent job to run SSIS packages. Setting availability for datasets is a very awkward way to go about this. A scheduler would be 10 times easier and more intuitive.

Also allow users to "run" a pipeline on demand, this would make testing a lot easier.
305 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
10 comments · Flag idea as inappropriate… · Delete… · Admin →

under review · AdminAzure Data Factory Team on UserVoice (Admin, Microsoft Azure) responded

Thank you for the feedback. We will look into this.
WYSIWIG UI

The JSON editor is OK but is still a barrier to entry. A WYSIWIG UI based on SSIS/Machine Learning Studio would really make this easier to use.
250 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
3 comments · Flag idea as inappropriate… · Delete… · Admin →

started · AdminAzure Data Factory Team on UserVoice (Admin, Microsoft Azure) responded

Thanks for your feedback. We are working towards an authoring experience that will be very easy and intuitive for you. We will be sharing more details in the coming months.
Add support for Power Query / Power BI Data Catalog as Data Store/ Linked Service

Power Query is awesome! It would be a great feature to be able to output its result into either a SQL database or Azure (Storage or SQL).
239 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
1 comment · Flag idea as inappropriate… · Delete… · Admin →

under review · AdminAzure Data Factory Team on UserVoice (Admin, Microsoft Azure) responded

Thank you for the feedback. We will look into this.
Move Activity

Activity that copies and then deletes.
139 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
4 comments · Flag idea as inappropriate… · Delete… · Admin →

under review · AdminThe Azure Team on UserVoice (Admin, Microsoft Azure) responded

Thanks for the feedback. We are looking at this.
Azure Data Factory - Add Ability to Update Dataset Availability

Idea from @Jeff_J_Jordan via Twitter:

Currently in Azure Data Factory once a dataset is deployed, you cannot change the availability for the dataset. If you attempt to change the availability you get the following error message:

“Updating the availability section of a Dataset is not supported. Existing availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=06:00:00, Style=StartOfInterval, new availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=07:00:00, Style=StartOfInterval.”

Take Daylight Savings Time for example, you want to run your datasets at midnight CST (UTC -6:00) but during DST it becomes 1:00 am CST. This is because UTC is not affected by DST, in effect CST becomes UTC -5:00. Other examples include wanting to change the offset or interval in development, test, or QA environments.

The work-around is to delete the datasets and related piplelines and re-create them with the new settings. This is a very tedious and time consuming process that is really quite painful.
This also, doesn't work well with deploying Azure Data Factories from Visual Studio using a config file. If the availability could be updated it would be rather easy to update the config file and publish the data factory.

Idea from @Jeff_J_Jordan via Twitter:

Currently in Azure Data Factory once a dataset is deployed, you cannot change the availability for the dataset. If you attempt to change the availability you get the following error message:

“Updating the availability section of a Dataset is not supported. Existing availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=06:00:00, Style=StartOfInterval, new availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=07:00:00, Style=StartOfInterval.”

Take Daylight Savings Time for example, you want to run your datasets at midnight CST (UTC -6:00) but during DST it becomes 1:00 am CST. This is because UTC is not affected by DST, in effect…
122 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
3 comments · Flag idea as inappropriate… · Delete… · Admin →
Integrate with Functions

It'd make it much easier to adopt Data Factory if it was possible to add Azure Functions activities into a Pipeline.

You can already store a blob and make an Azure Function to trigger based on that, but having the functions directly in the pipeline source would make the Data Factory management easier. Not to mention the clarity it'd give about the Data Factory functionality.
87 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
2 comments · Flag idea as inappropriate… · Delete… · Admin →
Use partition in filefilter and filename

At the moment you can only use * and ? in the file filter. It would be very helpful if you could use the partitionedBy section which you can use for the folderpath in the filefilter or the filename as well.

This would allow scenarios where you need files like myName-2015-07-01.txt where the slice date and time is part of the filename.
86 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
under review · 0 comments · Flag idea as inappropriate… · Delete… · Admin →
Provide method to programmatically execute an ADF pipeline on-demand

Now that we have Azure functions available that can help make batch data processing more real-time, it would be great to be able to programmatically invoke the workflow component of ADF, for immediate execution of the pipeline.
78 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
3 comments · Flag idea as inappropriate… · Delete… · Admin →
Event Hub

Source and sink.
77 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
2 comments · Flag idea as inappropriate… · Delete… · Admin →

under review · AdminAzure Data Factory Team on UserVoice (Admin, Microsoft Azure) responded

Thank you for the feedback. We will look into this.
Azure Data Factory - Add Ability to Update Dataset Availability

Currently in Azure Data Factory once a dataset is deployed, you cannot change the availability for the dataset. If you attempt to change the availability you get the following error message:

“Updating the availability section of a Dataset is not supported. Existing availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=06:00:00, Style=StartOfInterval, new availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=07:00:00, Style=StartOfInterval.”

Take Daylight Savings Time for example, you want to run your datasets at midnight CST (UTC -6:00) but during DST it becomes 1:00 am CST. This is because UTC is not affected by DST, in effect CST becomes UTC -5:00. Other examples include wanting to change the offset or interval in development, test, or QA environments.

The work-around is to delete the datasets and related piplelines and re-create them with the new settings. This is a very tedious and time consuming process that is really quite painful.
This also, doesn't work well with deploying Azure Data Factories from Visual Studio using a config file. If the availability could be updated it would be rather easy to update the config file and publish the data factory.

Currently in Azure Data Factory once a dataset is deployed, you cannot change the availability for the dataset. If you attempt to change the availability you get the following error message:

“Updating the availability section of a Dataset is not supported. Existing availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=06:00:00, Style=StartOfInterval, new availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=07:00:00, Style=StartOfInterval.”

Take Daylight Savings Time for example, you want to run your datasets at midnight CST (UTC -6:00) but during DST it becomes 1:00 am CST. This is because UTC is not affected by DST, in effect CST becomes UTC -5:00. Other…
74 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
1 comment · Flag idea as inappropriate… · Delete… · Admin →
Provide a folder option to manage multiple datasets and piplelines. ADF Diagram also based on Folder/Area.

We have around 100 datasets and 10 different pipelines. This will grow to 2000 datasets and 150 pipelines in future based on business functionality, data categorization and dependency. We already see a problem in managing it, as we are not able to arrange them in a folder(collapse) and diagram for it becomes difficult to explain. If we introduce a functionality to manage all datasets/pipeline related to one area in a specific folder and also have diagram feature specific to that folder level, it will simplify it a lot. See Before and After attachment for more clarity.
72 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
5 comments · Flag idea as inappropriate… · Delete… · Admin →
Apache Spark

Let's enable the hottest Big Data technology of the day as a data hub. This enables an in memory ELT capability to the ADF family.
71 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
1 comment · Flag idea as inappropriate… · Delete… · Admin →

started · AdminThe Azure Team on UserVoice (Admin, Microsoft Azure) responded

We are working on supporting 1st-class Spark activity instead of triggering via MapReduce activity indirectly. Please stay tuned.
Support Azure app service API

Can it consume or push data to Azure app service API? Supporting Swagger API.
69 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
0 comments · Flag idea as inappropriate… · Delete… · Admin →

under review · AdminAzure Data Factory Team on UserVoice (Admin, Microsoft Azure) responded

Thank you for the feedback. We will look into this.
Add configurable REST and SOAP Web Service sources, so it can ingest data from other cloud services.

There are many cloud applications that expose data via a SOAP or REST api. Customers should be able to configure generic REST and SOAP data sources for use in Azure Data Factory. Other ELT and ETL tools such as Dell Boomi, Informatica, SSIS and Talend have this functionality.
53 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
1 comment · Flag idea as inappropriate… · Delete… · Admin →
Talk to the O365 OneRM team and get a copy of their O365DataTransfer program -- it does a lot of things DF needs to do.

O365 OneRM team is solving the same problems that you are, and has a mature platform that does many of the things that Azure DF will need to do. Talk to Zach, Naveen, and Karthik in building 2. Also talk to Pramod. It'll accelerate you in terms of battle-hardened user needs and things pipeline automation has needed to do in the field. You'll want to get a copy of the bits/code.
50 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
0 comments · Flag idea as inappropriate… · Delete… · Admin →
Web and ODATA connectors need to support OAuth

the web and odata connectors need to add support for OAuth ASAP. Most other Microsoft services (Office 365, PWA, CRM, etc, etc, etc) along with many other industry API's require the use of OAuth. Not having this closes the door to lots of integration scenarios.
47 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
5 comments · Flag idea as inappropriate… · Delete… · Admin →

started · AdminThe Azure Team on UserVoice (Admin, Microsoft Azure) responded

Now OData connector support AAD-based OAuth, you can try it out via copy wizard. And more details on OData connector can be found at https://azure.microsoft.com/en-us/documentation/articles/data-factory-odata-connector/. Keep this feedback live – if you need OAuth for Web, please leave comment.
Provide status of pipeline activities for long running jobs

As of now, portal shows status as In Progress or Running, but doesn't provide in-depth detail or logs of pipeline activity that is In Progress. For example, I started a copy activity from Azure table storage to Azure blob storage. It has been running for last 15 hours, but it doesn't give any idea around, how many rows or data volume that has been copied already. For any activity, the status can be provided every few mins.
45 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
started · 0 comments · Flag idea as inappropriate… · Delete… · Admin →
Clear errors and "unused" data slices

There should be a option to clear old errors.
When there is no pipeline that produces or consumes a data slice, and this slice has errors the counter still shows "current" errors, and this is not the case. I would like to remove these unused slices and their errors.
43 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
0 comments · Flag idea as inappropriate… · Delete… · Admin →
Elasticsearch

source and sink.
42 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
0 comments · Flag idea as inappropriate… · Delete… · Admin →

planned · AdminThe Azure Team on UserVoice (Admin, Microsoft Azure) responded

Related to this, we are now working on enabling Azure Search as a copy destination in ADF. If you are interested in this, please leave comment here and we will contact you.
GUI interface similar to Azure ML UX experience

GUI interface similar to Azure ML UX experience
36 votes
Vote

Sign in
prestine

Your name

Your email address
Check!

invalid email

(thinking…)

Reset

or sign in with

UserVoice password

Forgot password?

Create a password

I agree to the terms of service

Signed in as (Sign out)

Close

Close

You have left! (?) (thinking…)
under review · 1 comment · Flag idea as inappropriate… · Delete… · Admin →

← Previous 1 2 3 4 5 … 8 9 Next →

Don't see your idea?

Data Factory

How can we improve Microsoft Azure Data Factory?

There are two ways to get more votes:

Schedule pipelines as jobs / run on pipelines on demand

WYSIWIG UI

Add support for Power Query / Power BI Data Catalog as Data Store/ Linked Service

Move Activity

Azure Data Factory - Add Ability to Update Dataset Availability

Integrate with Functions

Use partition in filefilter and filename

Provide method to programmatically execute an ADF pipeline on-demand

Event Hub

Azure Data Factory - Add Ability to Update Dataset Availability

Provide a folder option to manage multiple datasets and piplelines. ADF Diagram also based on Folder/Area.

Apache Spark

Support Azure app service API

Add configurable REST and SOAP Web Service sources, so it can ingest data from other cloud services.

Talk to the O365 OneRM team and get a copy of their O365DataTransfer program -- it does a lot of things DF needs to do.

Web and ODATA connectors need to support OAuth

Provide status of pipeline activities for long running jobs

Clear errors and "unused" data slices

Elasticsearch

GUI interface similar to Azure ML UX experience

Feedback

Data Factory

Feedback and Knowledge Base

Searching…

Give feedback

Microsoft Azure

Your password has been reset