During the Business Applications Event in July 2018, Microsoft announced a new service called Dataflows (formerly known as Datapools and Common Data Service for Analytics (CDS-A)).

There are already a couple of great articles (here and here) explaining what Dataflows are and the technologies supporting it.  This blog focuses on how it could affect the Business Intelligence (BI) industry and in particular, the traditional, structured data warehouse.

Impact on Business Intelligence

One of the biggest BI challenges in todays world is the disparity between self-serve data analysts and corporate IT.  Power BI is no exception, unfortunately.  Originally designed for self-serve, the product has evolved to become a very good enterprise solution.  Power BI works great at both ends of the spectrum – but what about a fully democratized data reporting platform?

Right on cue, Microsoft announced Dataflows. The bridge between self-service analytics and a fully governed data hub. In a nutshell, they provide a data preparation layer that sits in between persisted data storage and a Power BI Dataset. Dataflows contain a standard schema, called the Common Data Model (CDM), which includes business entities across marketing, sales, service, finance workloads; along with a variety of other connectors like Excel and Salesforce.

Azure Data Lake Store Gen 2 is the data storage layer that sits beneath IT and all forms of self-service.  Fig 1.0 illustrates how IT approved entities can be easily migrated into an enterprise Data Lake in Azure.  As a result, common and key entities can be served to/from data scientists (unstructured data) and data analysts (structured data).  This means an organization can fully govern the data sitting across multiple areas of the business, thus providing better data quality and one version of the truth.

As a BI architect, one question I am always asked is ‘Should we use a data lake or traditional data warehouse to support our BI capabilities?’. It is a completely valid question, but the short answer (to a potentially long debate) is it depends on the workloads required.  What I mean by this is if your organization only works with structured, transactional data, then a traditional data warehouse is perfectly fine.  On the other hand, if your business wants to store videos, audio and undertake data science related activities, then a data lake would be far more suitable.

Technologies such as Spark (Databricks), Data Lake Analytics and Azure Data Warehouse are still relatively new.  Many Microsoft BI Developers are just beginning to learn how to use them, meaning traditional data warehouses will often be used in favor of the end-to-end architecture above.  I do expect this trend to gradually shift over the next few years, as advanced analytical processes (Machine Learning, AI, etc.)  and handling big data are tailor made for Azure Data Lake Storage.

Current Limitations

There are a few limitations with Dataflows at the moment.  The product is very much V0.1, so this was to be expected really.  I am hopeful all of the limitations will be addressed and resolved over time.

  • Ability to create DAX Measures.  Dataflows are designed to undertake transformations and data mashup, not for specific data modelling and measure creation.  This may change in the future, though.
  • Incremental refresh frequency depends on the license – same restrictions apply (8 times per day) with Power BI Pro license.
  • Automatic refreshes on dependent entities only possible within a workspace at the moment.
  • Basic monitoring for refresh failures.  There are some out of the box analytics to look at refresh history but is currently limited.
  • Not all out of the box Power BI Desktop transformations are available in the Service at the time of writing.
  • Data Lineage.  Not available right now but will be coming into Power BI and Dataflows soon, meaning data models will be easier to govern.
  • Dataflows permissions are assigned to a Power BI workspace, not specifically to a user or AAD Group.
  • There is no clear migration path or technique from moving Power Query transformations (made in Power BI Desktop) into Dataflows.  I would be shocked if Microsoft do not offer this feature very soon.
  • There are no ‘in-between’ licensing options – you are either a Pro user or Premium user.  Smaller business requiring more than 8 refreshes could be forced to pay Premium prices.
  • Deploying Dataflows across multi Azure/Power BI tenancies is not yet programmatically possible.
  • Dataflows and Power BI refreshes are not directly linked at the moment, meaning both have to be scheduled in order for new data to appear to the user.
  • Row-level-security (RLS) still remains at the dataset level (i.e. Power BI Model).  This isn’t necessarily a limitation, more of a design choice by Microsoft.

The Future

The rapid change in tools and technologies within data analytics and BI mean we cannot just ‘assume’ Dataflows will suddenly become the glue that holds everything together.  The concept and initial demos are extremely promising, and I can see a massive advantage to customers seeking simpler, more enriched insights on top of D365/Dynamics.  Giving users the ability to create governed and re-usable entities ensure both the IT department and self-serve analysts can carry out their roles without blocking each other.

Another exciting feature within Dataflows is the Power Query and M transformation capabilities, which were previously only possible in Power BI Desktop.  Potentially complex data warehouse development and denormalization of data can now be achieved within composite models, across a variety of data sources.  There will certainly be a few bumps in the road whilst Microsoft continue to improve Dataflows, but the intent to bridge self-service with corporate IT departments is a real step in the right direction.

Contact Me

Feel free to Tweet me if you have any questions or comments.

Twitter:  @DataVizWhizz

The post Dataflows in Power BI: A Data Analytics Gamechanger? appeared first on eBECS.