Introduction to Microsoft Fabric: using Notebook to load and transform data into
Subhad365
7
User Group Leader
Whats up guys?!!!
Here is a post on Microsoft Fabric: which is an unified platform for data engineering, data warehousing, and business intelligence and essentially an one stop solution to facilitate end-to-end analytics and data platform designed for enterprises that require a unified solution, as well as to satiate all your need for data movement, processing, ingestion, transformation, real-time event routing, and report building. This blog could give you an idea to ingest and transform data from an external application (e.g. website/API/files or could be an Azure storage account) and load them into Lakehouse.
Ok: let's begin by creating a free account for Microsoft Fabric. Start by signing up for a dummy instance for D365: https://learn.microsoft.com/en-us/dynamics365/project-operations/environment/admin-trials
This will create a D365CE-F&O instance as along with a litter of utilities like:
a. Portal Azure login: although it will be an entirely a zero credit instance, however you can still create Entra IDs from here. This can help you to connect to F&O instance (if need be).
b. You can connect to Power-platform utilities: creating Power apps, Flows and Power BI.
c. Create a Free access to Microsoft Fabric, by visiting: https://app.fabric.microsoft.com/ , using the ID that will be cerated from the above. This will create neecessary capacities for Microsoft Fabric, with enough provisions to let you do enough of R&Ds for 60 days ✌✌✌
Once you login, your ladning page should look like:
Look at the URL from the address bar. Note that, I have purposely kept the experience= data-engineering. You can change this to PowerBI, if you wish to explore Power BI capabilities of MS Fabric.
Once done, you can create a new Workspace from as follows:
Give a suitable name:
Press Create to continue.
Next, let us create a notebook for your Lakehouse. Click on New and select NoteBook
As usual give a proper name to your Notebook >> Click Create. This will automarically create a Notebook, under the Lakehouse created from the above step.
All set: let us now give you a background on what we will be doing. We need a dataset, to demo you how to bring the data from the said dataset and load it in as a file to the Lakehouse. For our demo, let us take it from Kaggle: a gigantic storehouse of data for data enfgineering. You can create an account here, and yes: it's free.
For our testing purpose, I will be using data from Kaggle gym data, which could be found as under:
https://www.kaggle.com/datasets/valakhorasani/gym-members-exercise-dataset
Click on your name icon >>
Under API, click on Create New Token:
This will create new token for your User name and will get downloaded as a file in your local computer.
Let us come back to our workspace Notebook. Don't forget to ensure that you have chosen Python from the above highlighted:
Let us start by installing Kaggle module:
Click on the highlighted Run button, to start your session and work on the commnd you have given to start on.
Hover your mouse below the result, the following will appear to add new cell to add more code:
import os
os.chdir('/lakehouse/default/Files')
os.environ['KAGGLE_USERNAME'] = '<you will get this from the token file downloaded>'
os.environ['KAGGLE_KEY'] = 'you will get this from the token file downloaded'
from kaggle.api.kaggle_api_extended import KaggleApi
api = KaggleApi()
api.authenticate()
api.dataset_download_file('valakhorasani/gym-members-exercise-dataset', 'gym_members_exercise_tracking.csv')
By saying this:
a. You are changing the directory to the \Lakehouse\default\files folder:
Using this you need to fill up the above code as necesssary. Run the code, the following will appear:
And just under the File (once you do a refresh), you would see the Gym-member-exercise file has been uploaded too:
You can now inspect the content of the file, by insepcting the data, by loading into a Dataframe:
import pandas as pd
import pandas as pd
df = pd.read_csv('/lakehouse/default/' + "Files/gym_members_exercise_tracking.csv")
df.head()
So that's just the beginning. I will come back to show how to manipulate with this data, how to feed data from D365F&O and Dataverse soon.
Keep learning, keep Azure-ing. Love and Namaste, as always 💓💓💓
*This post is locked for comments