Microsoft Monthly Newsletter: May 2023
15/05/2023What are RPO’s and RTO’s?
14/06/2023Last week, Microsoft announced Fabric, their new data and analytics platform. This is pitched as being a unified service bringing together multiple features and capabilities to allow businesses to manage and analyse their data with greater ease and functionality end-to-end. But what is Fabric and is it any different to the multitude of existing data analysis services already provided by Microsoft?
What is Fabric?
Microsoft currently provides multiple services in Azure to manage and analyse data. Traditionally, organisations may use Azure Data Factory to extract data from data sources, transform it and push it to Azure Synapse Dedicated SQL Pools, or databases. Alternatively, they may use Azure Synapse with Spark pools to manipulate and query data or use Power BI to produce user friendly dashboards or reports. Although these services can already integrate with each other, they are still separate entities potentially requiring uniquely defined security boundaries, implementation by different teams and challenges with data sprawl. Microsoft Fabric now unifies these services, bringing them all together to provide a single product to allow organisations and users to manage their data and visualisations easily.
To provide this functionality and more, Microsoft has included seven core workloads within the platform:
- Data Factory – providing connectors to cloud and on-premises data sources to provide transformation pipelines. (Preview).
- Synapse Data Engineering – enabling authoring experiences of Spark Pools. (Preview).
- Synapse Data Science – Providing an end-to-end workflow to build AI models and easily train, deploy, and manage machine learning. (Preview).
- Synapse Data Warehousing – Providing Lake House and data warehouse capabilities. (Preview).
- Synapse Real-Time Analytics – Allowing organisations to work with data streaming, IoT devices, logs, and telemetry data. (Preview).
- Power BI – Providing visualisations and AI analytics to gain insights into business data.
- Data Activator – A no-code services to provide real-time detection and monitoring of data to trigger alerts and notifications. (Coming soon).
Image Source – https://azure.microsoft.com/en-us/blog/introducing-microsoft-fabric-data-analytics-for-the-era-of-ai/
Data Integration with OneLake
The services provided by Microsoft Fabric allow an organisation to manipulate, transform and analyse data for almost any scenario, but the heart of any analytics service is the data. Organisations use multiple data sources, of different types, versions, locations, capacities, and capabilities and, if not carefully architected, this results in data sprawl. Duplication of data in multiple new data sources, managed by different owners, with unique security requirements and of inconsistent qualities, can make finding a single version of the truth complicated. In recent years, Data Lakes have grown in popularity to provide a single location to store multiple data sources ready for analysis. However, these can still be complicated to manage and query consistently.
To solve these issues, Microsoft Fabric is deployed with a SaaS, multi-cloud data lake, called OneLake. This is built-in to the platform and automatically available to every Fabric tenant. This is pitched as the OneDrive for analytics with data organised into a data hub, automatically indexed for discovery, sharing, governance and compliance. The idea behind OneLake is to provide a single unified storage system, built on Azure Data Lake Storage Gen2. A key capability is shortcuts, to allow data to be shared, but not duplicated, achieved via data is virtualisation. These shortcuts can even be shared across Amazon S3, and soon Google Storage, providing multi-cloud data sharing capabilities.
Open data formats are key for the OneLake system, with Delta files on top of Parquet files as the native data format. This allows OneLake to support structured and unstructured data and means organisations don’t have to move data around to support analysis or visualisation requirements. Instead of using multiple databases, data lake silos, data warehousing and business intelligence services for data storage, the single location of OneLake can be used by all the Fabric services, quickly, intuitively, and securely.
AI Built In
It seems like every new service these days must have AI utilised somewhere and Fabric is no exception. Microsoft have used the Azure OpenAI Service to allow users to find insights with their data. Using CoPilot, users can ask conversational language to create data flows, generate code and visualise results. An organisation can even use their own AI models to publish with their data.
Is it worth it?
With the service still in preview, the full capabilities and features offered by the Fabric platform are not yet fully available. However, many of the services coming under the umbrella of Fabric have been around for some time now and used extensively by data teams. For this reason, simplifying the management and unifying these services seems like a positive step.
One of the key takeaways from this announcement is really the implementation of the OneLake service. With Microsoft pitching this as the OneDrive for analytics, this could allow the storage, manipulation, access, and administration of an organisation’s data to be greatly simplified, with complex data engineering and transformation processes significantly reduced. It is this simplified data management approach which could provide a real positive benefit to data teams and users.
The only concern is that Microsoft have consistently changed their analytics platform over the years. Many organisations invested heavily in on-premises data warehouses, and then moved to large Parallel Data Warehouse systems, or maybe migrated to Azure Analysis Services, then Azure Synapse, maybe using dedicated SQL pools, or data lake storage. Fabric has the potential to be yet another system that organisations must look to consider implementing, but with stability in the data analytics area not being as mature as other Microsoft data platforms, such as SQL Server, this may put off organisations moving to utilise these new service for some time.