Data Factory
Azure Data Factory allows you to manage the production of trusted information by offering an easy way to create, orchestrate, and monitor data pipelines over the Hadoop ecosystem using structured, semi-structures and unstructured data sources. You can connect to your on-premises SQL Server, Azure database, tables or blobs and create data pipelines that will process the data with Hive and Pig scripting, or custom C# processing. The service offers a holistic monitoring and management experience over these pipelines, including a view of their data production and data lineage down to the source systems. The outcome of Data Factory is the transformation of raw data assets into trusted information that can be shared broadly with BI and analytics tools.
Do you have an idea, suggestion or feedback based on your experience with Azure Data Factory? We’d love to hear your thoughts.
-
Schedule pipelines as jobs / run on pipelines on demand
Rather than the time slice idea, allow us to schedule pipelines as jobs, the same way I would schedule an agent job to run SSIS packages. Setting availability for datasets is a very awkward way to go about this. A scheduler would be 10 times easier and more intuitive.
Also allow users to "run" a pipeline on demand, this would make testing a lot easier.
305 votesThank you for the feedback. We will look into this.
-
WYSIWIG UI
The JSON editor is OK but is still a barrier to entry. A WYSIWIG UI based on SSIS/Machine Learning Studio would really make this easier to use.
250 votesThanks for your feedback. We are working towards an authoring experience that will be very easy and intuitive for you. We will be sharing more details in the coming months.
-
Add support for Power Query / Power BI Data Catalog as Data Store/ Linked Service
Power Query is awesome! It would be a great feature to be able to output its result into either a SQL database or Azure (Storage or SQL).
239 votesThank you for the feedback. We will look into this.
-
Move Activity
Activity that copies and then deletes.
139 votesThanks for the feedback. We are looking at this.
-
Azure Data Factory - Add Ability to Update Dataset Availability
Idea from @Jeff_J_Jordan via Twitter:
Currently in Azure Data Factory once a dataset is deployed, you cannot change the availability for the dataset. If you attempt to change the availability you get the following error message:
“Updating the availability section of a Dataset is not supported. Existing availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=06:00:00, Style=StartOfInterval, new availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=07:00:00, Style=StartOfInterval.”
Take Daylight Savings Time for example, you want to run your datasets at midnight CST (UTC -6:00) but during DST it becomes 1:00 am CST. This is because UTC is not affected by DST, in effect…
122 votes -
Integrate with Functions
It'd make it much easier to adopt Data Factory if it was possible to add Azure Functions activities into a Pipeline.
You can already store a blob and make an Azure Function to trigger based on that, but having the functions directly in the pipeline source would make the Data Factory management easier. Not to mention the clarity it'd give about the Data Factory functionality.
87 votes -
Use partition in filefilter and filename
At the moment you can only use * and ? in the file filter. It would be very helpful if you could use the partitionedBy section which you can use for the folderpath in the filefilter or the filename as well.
This would allow scenarios where you need files like myName-2015-07-01.txt where the slice date and time is part of the filename.
86 votes -
Provide method to programmatically execute an ADF pipeline on-demand
Now that we have Azure functions available that can help make batch data processing more real-time, it would be great to be able to programmatically invoke the workflow component of ADF, for immediate execution of the pipeline.
78 votes -
Event Hub
Source and sink.
77 votesThank you for the feedback. We will look into this.
-
Azure Data Factory - Add Ability to Update Dataset Availability
Currently in Azure Data Factory once a dataset is deployed, you cannot change the availability for the dataset. If you attempt to change the availability you get the following error message:
“Updating the availability section of a Dataset is not supported. Existing availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=06:00:00, Style=StartOfInterval, new availability configuration: Frequency=Day, Interval=1, AnchorDateTime=01/01/0001 00:00:00, Offset=07:00:00, Style=StartOfInterval.”
Take Daylight Savings Time for example, you want to run your datasets at midnight CST (UTC -6:00) but during DST it becomes 1:00 am CST. This is because UTC is not affected by DST, in effect CST becomes UTC -5:00. Other…
74 votes -
Provide a folder option to manage multiple datasets and piplelines. ADF Diagram also based on Folder/Area.
We have around 100 datasets and 10 different pipelines. This will grow to 2000 datasets and 150 pipelines in future based on business functionality, data categorization and dependency. We already see a problem in managing it, as we are not able to arrange them in a folder(collapse) and diagram for it becomes difficult to explain. If we introduce a functionality to manage all datasets/pipeline related to one area in a specific folder and also have diagram feature specific to that folder level, it will simplify it a lot. See Before and After attachment for more clarity.
72 votes -
Apache Spark
Let's enable the hottest Big Data technology of the day as a data hub. This enables an in memory ELT capability to the ADF family.
71 votesWe are working on supporting 1st-class Spark activity instead of triggering via MapReduce activity indirectly. Please stay tuned.
-
Support Azure app service API
Can it consume or push data to Azure app service API? Supporting Swagger API.
69 votesThank you for the feedback. We will look into this.
-
Add configurable REST and SOAP Web Service sources, so it can ingest data from other cloud services.
There are many cloud applications that expose data via a SOAP or REST api. Customers should be able to configure generic REST and SOAP data sources for use in Azure Data Factory. Other ELT and ETL tools such as Dell Boomi, Informatica, SSIS and Talend have this functionality.
53 votes -
Talk to the O365 OneRM team and get a copy of their O365DataTransfer program -- it does a lot of things DF needs to do.
O365 OneRM team is solving the same problems that you are, and has a mature platform that does many of the things that Azure DF will need to do. Talk to Zach, Naveen, and Karthik in building 2. Also talk to Pramod. It'll accelerate you in terms of battle-hardened user needs and things pipeline automation has needed to do in the field. You'll want to get a copy of the bits/code.
50 votes -
Web and ODATA connectors need to support OAuth
the web and odata connectors need to add support for OAuth ASAP. Most other Microsoft services (Office 365, PWA, CRM, etc, etc, etc) along with many other industry API's require the use of OAuth. Not having this closes the door to lots of integration scenarios.
47 votesNow OData connector support AAD-based OAuth, you can try it out via copy wizard. And more details on OData connector can be found at https://azure.microsoft.com/en-us/documentation/articles/data-factory-odata-connector/. Keep this feedback live – if you need OAuth for Web, please leave comment.
-
Provide status of pipeline activities for long running jobs
As of now, portal shows status as In Progress or Running, but doesn't provide in-depth detail or logs of pipeline activity that is In Progress. For example, I started a copy activity from Azure table storage to Azure blob storage. It has been running for last 15 hours, but it doesn't give any idea around, how many rows or data volume that has been copied already. For any activity, the status can be provided every few mins.
45 votes -
Clear errors and "unused" data slices
There should be a option to clear old errors.
When there is no pipeline that produces or consumes a data slice, and this slice has errors the counter still shows "current" errors, and this is not the case. I would like to remove these unused slices and their errors.43 votes -
Elasticsearch
source and sink.
42 votesRelated to this, we are now working on enabling Azure Search as a copy destination in ADF. If you are interested in this, please leave comment here and we will contact you.
-
GUI interface similar to Azure ML UX experience
GUI interface similar to Azure ML UX experience
36 votes
- Don't see your idea?
