During the Business Applications Event in July 2018, Microsoft announced a new service called Dataflows (formerly known as Datapools and Common Data Service for Analytics (CDS-A)).
There are already a couple of great articles (here and here) explaining what Dataflows are and the technologies supporting it. This blog focuses on how it could affect the Business Intelligence (BI) industry and in particular, the traditional, structured data warehouse.
One of the biggest BI challenges in todays world is the disparity between self-serve data analysts and corporate IT. Power BI is no exception, unfortunately. Originally designed for self-serve, the product has evolved to become a very good enterprise solution. Power BI works great at both ends of the spectrum – but what about a fully democratized data reporting platform?
Right on cue, Microsoft announced Dataflows. The bridge between self-service analytics and a fully governed data hub. In a nutshell, they provide a data preparation layer that sits in between persisted data storage and a Power BI Dataset. Dataflows contain a standard schema, called the Common Data Model (CDM), which includes business entities across marketing, sales, service, finance workloads; along with a variety of other connectors like Excel and Salesforce.
Azure Data Lake Store Gen 2 is the data storage layer that sits beneath IT and all forms of self-service. Fig 1.0 illustrates how IT approved entities can be easily migrated into an enterprise Data Lake in Azure. As a result, common and key entities can be served to/from data scientists (unstructured data) and data analysts (structured data). This means an organization can fully govern the data sitting across multiple areas of the business, thus providing better data quality and one version of the truth.
As a BI architect, one question I am always asked is ‘Should we use a data lake or traditional data warehouse to support our BI capabilities?’. It is a completely valid question, but the short answer (to a potentially long debate) is it depends on the workloads required. What I mean by this is if your organization only works with structured, transactional data, then a traditional data warehouse is perfectly fine. On the other hand, if your business wants to store videos, audio and undertake data science related activities, then a data lake would be far more suitable.
Technologies such as Spark (Databricks), Data Lake Analytics and Azure Data Warehouse are still relatively new. Many Microsoft BI Developers are just beginning to learn how to use them, meaning traditional data warehouses will often be used in favor of the end-to-end architecture above. I do expect this trend to gradually shift over the next few years, as advanced analytical processes (Machine Learning, AI, etc.) and handling big data are tailor made for Azure Data Lake Storage.
There are a few limitations with Dataflows at the moment. The product is very much V0.1, so this was to be expected really. I am hopeful all of the limitations will be addressed and resolved over time.
The rapid change in tools and technologies within data analytics and BI mean we cannot just ‘assume’ Dataflows will suddenly become the glue that holds everything together. The concept and initial demos are extremely promising, and I can see a massive advantage to customers seeking simpler, more enriched insights on top of D365/Dynamics. Giving users the ability to create governed and re-usable entities ensure both the IT department and self-serve analysts can carry out their roles without blocking each other.
Another exciting feature within Dataflows is the Power Query and M transformation capabilities, which were previously only possible in Power BI Desktop. Potentially complex data warehouse development and denormalization of data can now be achieved within composite models, across a variety of data sources. There will certainly be a few bumps in the road whilst Microsoft continue to improve Dataflows, but the intent to bridge self-service with corporate IT departments is a real step in the right direction.
Feel free to Tweet me if you have any questions or comments.
The post Dataflows in Power BI: A Data Analytics Gamechanger? appeared first on eBECS.