Announcements
Hi Everyone,
To build Power BI reports for F&O, we are exporting the F&O data to data lake and scheduled Power BI refreshes to pull the data from Data lake to Power BI. The challenge we are seeing is, refreshes are failing sometimes because the files in the datalake are getting refreshed at the same time.
I want to understand how other projects are handling this. We see there is no way to control when the data should get refreshed to the datalake. If we can say datalake refresh to happen at 11pm, we will be able to schedule the Power BI refresh at 3am so that we don't get into this problem.
Appreciate your thoughts.
Regards,
Kumar
There are a few possible solutions to your challenge of failed refreshes due to simultaneous updates to the data lake.
1. Implement a staggered refresh schedule: Instead of scheduling both the data lake and Power BI refreshes at the same time, you could stagger them to occur at different times. For example, you could schedule the data lake refresh to happen at 11pm and the Power BI refresh to happen at 3am, as you suggested. This way, there will be a time gap between the two refreshes, which should reduce the chances of any conflicts.
2. Use version control: Another approach is to use version control for your data in the data lake. This involves creating a new version of the data each time it is updated, rather than overwriting the existing data. This way, you can ensure that the Power BI reports are always using a consistent version of the data, regardless of when the refreshes happen. You can use tools like Azure Data Factory to implement version control for your data in the data lake.
3. Implement a locking mechanism: A locking mechanism can be used to prevent simultaneous updates to the data lake. This involves locking the file or folder that is being updated during the refresh, which prevents other processes from accessing it until the refresh is complete. This can be done using tools like Azure Data Lake Storage or Azure Blob Storage.
4. Optimize your refreshes: It's also important to optimize your refreshes to minimize the amount of time they take. This can involve using incremental refreshes, which only refresh the data that has changed since the last refresh, rather than refreshing the entire dataset. You can also use compression and partitioning techniques to reduce the amount of data that needs to be refreshed.
Overall, it's important to have a robust data governance strategy in place to ensure the reliability and consistency of your data lake and Power BI reports. This may involve a combination of the above solutions, as well as regular monitoring and maintenance of your data infrastructure
Hi Kumar,
It is recommended that you post your question to the Power BI forum for more professional help.
https://community.powerbi.com/.
You can troubleshoot by referring to the following link, Checking refresh status and history in a workspace.
learn.microsoft.com/.../refresh-data
learn.microsoft.com/.../refresh-troubleshooting-refresh-scenarios
André Arnaud de Cal...
294,095
Super User 2025 Season 1
Martin Dráb
232,866
Most Valuable Professional
nmaenpaa
101,158
Moderator