Breakout sessions
Tuesday, June 3rd
You can experience the following sessions. Delivered by international experts from both Microsoft and the field.
-
A quick journey through optimization techniques. Told differently than usual.
In this session, we will walk through the main optimization techniques – starting with classic indexes (B-Tree) for relational databases, via Z-Order, and Liquid Clustering for Lakehouses, and ending with the V-Order mechanism, recently introduced by Microsoft.
We will delve into the mathematical foundations behind the mechanisms to fill in some gaps and mention concepts that are often overlooked when presenting these techniques. But don’t be scared, We’ll introduce this theoretical knowledge in a very accessible way!
We’ll cover sorting, partitioning, the origin of the Z-Order curve, and many others. We’ll also break down the Parquet file into its components to fully understand how different pushdown mechanisms work.
We will talk about optimization techniques that you already know, but we’ll do it in a way different than usual ;). -
An Apache Spark query’s journey through the layers of Microsoft Fabric
Headline: An Apache Spark Query’s Journey Through the Layers of Microsoft Fabric
Abstract: Join us for an exciting deep dive into the heart of Apache Spark! We’ll take you on a journey to see exactly how your Spark queries get executed, both within Apache Spark itself and through the different layers of Microsoft Fabric. Here’s what we’ll explore together:
* Spark SQL and Catalyst: A break down how Spark SQL works hand-in-hand with the Catalyst optimizer to make your queries smarter and faster.
* A Note on Tungsten: Discover how Tungsten boosts Spark’s performance with better memory management and lightning-fast execution.
* A note on Fabrics native execution engine: Bringing the power of C++, for even faster query execution.
*Delta Lake: See how Delta Lake makes your data lakes more reliable and scalable, ensuring your data is always in top shape.
*Parquet Files: Learn why Parquet’s columnar storage is a game-changer for efficient data storage and quick retrieval.
We’ll look into the official Apache Spark source code on GitHub, giving you a real, hands-on look at what’s happening under the hood.
By the end of this session, you’ll have a clearer understanding of how your queries run and some tools and tips to help you solve problems and optimize your Spark jobs for both speed and cost. -
Another Brick in the Firewall: How to Secure your Azure Data Platform
These days, most people want their Azure Data Platforms to be deployed in a secure a network topology. As Data Engineers, we are often the ones that have the make this happen. The cloud has made it easy for us to deploy a virtual network here and a private endpoint there, but what does a good, networked data platform actually look like, and how does it work? Simple things become complex: how will my ADF Integration Runtime talk to my data sources? How do I securely access my resources to do development?
In this session we will look at some of the core network components which can be used to secure your data platform; what they are, and how to use them effectively. We will also look at some of the decisions that need to be made when moving your data platform inside a private network, which weren’t a consideration previously. Some basic knowledge of what a virtual network is is required.
By the end of this session you should feel more confident in working with network components in Azure and how you can use them to secure your Data Platform. -
Another query language — do we really need KQL?
As data professionals, we often ask ourselves: Why yet another new coding language. In the release of Fabric the KQL (Kusto Query Language) was also a part of the need to have a full implementation capability.
When we already have Python, PySpark, Scala and T-SQL? What problems does KQL solve, and when is it the right tool for the job vs. T-SQL?
In the world of data, choosing the right tool for the job can be the key to success. Both languages are powerful, and they are designed for different purposes and platforms. Understanding their strengths, differences, and ideal use cases can make or break your project when working with diverse data ecosystems.
Through a comparative discussion, we’ll give you the knowledge to know the strengths of each.
The foundational differences between T-SQL and KQL: syntax, execution, and purpose.
Ideal scenarios for using T-SQL versus KQL.
Key features like joins, aggregations, and data transformations, and how they are implemented in each language.
Practical use cases, including transitioning between the two when working in hybrid systems.
Tips and tricks on how you can use your T-SQL skills in the KQL world.
Whether you’re a T-SQL professional curious about the capabilities of KQL, or a KQL enthusiast looking to expand your database coding skills, this session will provide valuable insights to bridge the gap between these two powerful languages.
Let’s explore the best of both worlds and equip you with the knowledge of choosing the right tool for the right job in your projects. -
Apache Spark for SQL Data Warehouse Developers
Are you wondering how to get started with Apache Spark? Are you currently working with SQL Server or Azure SQL (MI)? Or, are you a Data Warehouse developer?
Apache Spark gained popularity in the past few years and became the new cool kid on the block. But what is Apache Spark and how can we leverage its capabilities? Which skills do we need? During this session, I will show you how we can use Apache Spark starting from our existing SQL skillset.
In this session, we will start with a brief introduction to Apache Spark and we will learn how we can use our SQL knowledge within Apache Spark.
After a short introduction, we will be focussing on practical examples and we will be comparing how we will solve challenges in SQL, and what the alternatives are in Apache Spark. Throughout the session, we will make the examples more elaborate and complex as we go.
You will learn how you can apply your SQL knowledge in Apache Spark.
By the end of this session, you will have a solid understanding of how you can use your SQL skills to solve challenges with Apache Spark. -
Delta Merge, the data engineer`s best friend
The ‘UPSERT pattern’, where a set of data changes is combined with existing data, is a pattern commonly used in data engineering. The UPSERT pattern allows you, the data engineer, to merge INSERTS, UPDATES and DELETES. Often it is only possible to perform these steps as separate operations which can be both time-consuming and error prone.
Delta Merge was added to Delta Lake to simplify the UPSERT process for data engineers, streamlining the process into a single command that handles the inserts, updates and deletes as a single operation.
During this session you will learn how you can use the PySpark or SparkSQL to seamlessly merge change data sets efficiently to implement common data modeling techniques such as Type 1 or Type 2 dimensions or soft deletes all or which are commonly used in data warehousing scenarios.
Join us to find out how the Delta Merge statement can really become the data engineer’s best friend, saving you time to focus on what matters. -
Enhancing the Developer Experience in Microsoft Fabric Warehouse through Functions
In the fast-evolving landscape of data analytics, enhancing the developer experience is crucial for achieving seamless and efficient workflows within data warehouses. This presentation will explore the transformative potential of using functions in Microsoft Fabric Warehouse to elevate developer productivity and satisfaction.
We’ll delve into the power of functions, including native SQL and various other types, to make code reusable, independently testable, and easily deployable. By encapsulating business logic within these functions, developers can streamline collaboration and foster innovation by sharing generalized solutions across teams.
Moreover, our session will illustrate how extending the boundaries of existing data warehouse solutions through advanced function usage can lead to more flexible and robust systems. We will demonstrate practical strategies for deploying these functions to enable self-service analytics, helping organizations to harness their full analytical potential.
Join us as we uncover the world of functions and their impact on performance, discussing both their benefits and trade-offs. We’ll guide you through techniques to optimize performance while ensuring that developer experience is significantly enhanced, ultimately propelling your data initiatives to new heights. -
Fabric Security – Everything you need to know
Security is a top priority for Microsoft Fabric. As a Fabric customer, you need to safeguard your assets from threats and follow your organization’s security policies. The Microsoft Fabric security session serves as an end-to-end security overview for Fabric. It covers details on how Microsoft secures your data by default as a software as a service (SaaS) service, and how you can secure, manage, and govern your data when using Fabric. This will be an engaging session and you will get an opportunity to provide direct feedbacks to the product team. If you are a data analytics professional implementing Fabric or if you are an IT/InfoSec admin who wants to make sure that proper controls are in place for your organizational tools, then don’t miss this session.
-
Fabric Spark at scale – tips, tricks and best practices
In this demo-centric session we will run through the tips, tricks and best practices when using Spark at scale.
In this session we will cover:
– Different ways to configure and manage your Spark Environments, including cluster sizing, libraries and configuration properties
– Tips for performance profiling and optimisation including when Delta optimisations in Fabric might be causing performance issues
– Different options for complex orchestration patterns that minimize cluster start-up time including use Airflow in Fabric
– Using Notebookutils to the fullest to orchestrate end-to-end scenarios
– Ways to use an Eventhouse to monitor your Spark Jobs and find performance regression.
This session is targeted at people who are either:
1 – looking to start a Fabric project in the near term with extensive use of Spark and Lakehouses; or
2 – those current using Fabric Spark who are wanting to take their skills to the next level. -
From Chaos to Clarity: Enabling Data teams with Observability
Data platform teams play a critical role in enabling other teams to drive business value. However, understanding how internal users interact with data systems often feels like solving a mystery. While traditional observability focuses on ensuring data integrity and system reliability, user observability opens up a new dimension: understanding who is using your data, how they’re using it, and where friction exists.
In this talk, we’ll explore how user observability impacts our approach to platform engineering at Yelp. We’ll discuss how real-time insights into user behavior helped us uncover hidden dependencies, diagnose incidents faster, and prioritize improvements that truly matter. You’ll hear stories of reducing friction in data access, enabling self-service, and using observability to inform the next generation of platform development.
This session is for platform engineers who want to empower their data teams with actionable insights, improve cross-team collaboration, and design platforms that deliver measurable value. You’ll leave with a fresh perspective on observability and a deeper understanding of how to align platform capabilities with the needs of the teams you enable. -
Grit and Growth: Stories from the Trenches
In this session, I’ll share the invaluable lessons I’ve learned from navigating some of the most challenging and uncomfortable work and business-related situations. From mastering the art of client interviews to spotting red flags early on, I’ll provide practical insights and strategies that can help you avoid common pitfalls. We’ll delve into the critical items to specify in contracts to protect your interests and ensure clarity in business dealings. Understanding your worth is another key theme we’ll explore, discussing how to confidently communicate your value and negotiate effectively. Join me for an honest and insightful look at the hard-earned wisdom that can help you thrive in your professional journey.
-
Help, my Azure Databricks is too expensive! Some tips and tricks.
Azure Databricks is expensive. Running a cluster can cost thousands, if not 10s of thousands of Euros per month. Therefore, Databricks is only suitable for the biggest of datasets. Seems to be common knowledge… right?
Let’s be honest, it doesn’t need to be this way. With some tips and tricks, Azure Databricks can be suitable for processing any kind of dataset. Without breaking the bank.
Not convinced? During this talk, I will show you how. Together, we follow several scenarios that unnecessarily increase your monthly spend. By understanding what these scenarios are, and how to solve them, we will get a grip on that Azure bill.
By the end of this talk, you will be able to:
– Understand the concept of “DBU”
– Create an Azure Databricks cluster in a cost-effective manner
– Stream data and run batch jobs in a way that doesn’t break the bank
– Use the Azure cost-calculator effectively
– Set budgets and alerts on your Azure subscriptions -
Secure data end-to-end with Microsoft Fabric and OneLake
Elevate your knowledge of data security in Microsoft Fabric with this in-depth look at the security and governance features available. In this session, we’ll start by providing an overview of the security capabilities within Microsoft Fabric. We’ll look at workspace, item, and fine-grained security features and how they layer together. Next, the session will explore the different engines within Microsoft Fabric and how they each bring their own security features and characteristics. This session will also answer questions around data mirroring, how to secure OneLake shortcuts, and many other important pieces of security info. Make sure to attend this look at data security in Microsoft Fabric!
-
Showcasing Fabric Studio
In this session I will give you an introduction to the Visual Studio Code extension “Fabric Studio” which allows you to manage your Fabric environment directly from within VSCode. Leveraging the Fabric REST APIs you can easily browse through Fabric items and run various tasks. For not so common API calls there is also the ability to user VSCode notebooks offering intelli-sense and auto-complete from all existing API calls. It further allows you to modify existing items like semantic models, notebooks, pipelines, etc. in the VSCode IDE and publish your changes back to the Fabric service. It also features a OneLake browser to inspect the output of your operations.
-
Simplifying Code Distribution with Databricks Asset Bundles
Sharing code effectively is key to building scalable and maintainable data solutions. Whether you’re deploying Python libraries or moving workflows across environments, efficient code distribution ensures consistency, reduces errors, and streamlines collaboration.
In this session, we’ll explore two powerful ways to package and distribute code: Python Wheels and Databricks Asset Bundles (DABs). You’ll learn how Python Wheels enable faster, more reliable sharing of Python code and how Databricks Asset Bundles allow you to package entire projects, including scripts, workflows, and Delta Live Tables. We’ll also cover the key differences between these approaches and when to use each.
By the end, you’ll have a clear understanding of how to distribute code effectively in Databricks. You’ll gain practical knowledge of Python Wheels for efficient package distribution and Databricks Asset Bundles for managing full-scale projects, helping you simplify development and deployment. -
The Crucial Role of Data Quality in Your Data Estate
In today’s data-driven landscape, the quality of your data directly impacts the accuracy of AI-driven insights and decision-making. Here’s why data quality matters:
– Trustworthy Insights: Reliable data ensures that AI models generate accurate predictions and recommendations. Without trustworthy data, there’s a risk of eroding trust in AI systems.
– Business Processes and Decision-Making: Poor data quality or incompatible data structures can hinder business processes and decision-making capabilities. Clean, well-structured data is essential for informed choices.
A powerful data platform plays a crucial role in maintaining high data quality. With a robust data platform, you can ensure that your data is consistent, accurate, and readily available for various applications and processes. Leveraging such a platform is foundational for implementing effective data quality measures.
During the session, we will guide you through Microsoft Purview Data Quality:
This comprehensive solution empowers business domain and data owners to assess and oversee data quality. It offers no-code/low-code rules, including out-of-the-box (OOB) and AI-generated rules.
Purview Data Quality incorporates AI-powered data profiling. It recommends columns for profiling, allowing human intervention to refine these recommendations. This iterative process enhances accuracy and improves underlying AI models.
By integrating Microsoft Purview with your data platform, you can apply and monitor data quality processes more effectively. This integration ensures seamless data management and governance across your data estate.
During the session, we will walk you through the Data Quality Life Cycle:
– Assign data quality steward permissions in your data catalog.
– Register and scan data sources in Microsoft Purview Data Map.
– Set up data source connections for quality assessment.
– Configure and run data profiling.
– Define and apply data quality rules. -
Understanding Fabric Capacities
You’ve heard about Microsoft Fabric, and you’re ready to take it for a spin? Excellent, let’s get us started off in those few advertised minutes! But hold on .. you need a capacity to actually use something, and might not be completely clear on what it actually entails? You’re not alone with these questions, and it is perfectly fine to stop and think about it for a while. In fact, it’s a good thing you want to understand the single most core concept of Fabric as that will hopefully allow you to make better decisions down the road.
The introduction of Fabric Capacities sparked a lot of questions with Data Architects, Engineers, and Analysts coming from an IaaS or Paas (Infrastructure or Platform as a Service) way of working. Microsoft Fabric is presented as an all-in-one Analytics SaaS (Software as a Service) solution, with a unified measure for Compute and Storage. Great, promising to make the cost and performance predictability a lot simpler. Great! But what exactly does that mean, and what will it actually cost the company?
To understand Fabric Capacities, we need to briefly look at the architecture and what exactly those unified measures look like, including how they are similar, yet different from the existing Power BI Premium Capacities. Understanding the different types and sizes of capacities will help us make the right decisions for our Data Platform solutions in the organization.
But then, how do you manage those capacities and assess if they are in a healthy state? What are some of the options to follow the demands and needs of your business users to allocate the right resources to them? Most importantly, what options do I have to automate the majority of these tasks?
Walking out of the session, you should understand the key concept of Fabric Capacities and how they are at the core of everything you’ll do in Microsoft Fabric, be able to choose the one that is right for you, periodically assess if the choice was right, and act where needed. -
Work smarter, not harder! 10 cool things Semantic Link can do for you in Microsoft Fabric
How many times have you heard: work smarter, not harder! But it’s easier said than done, right? What if I tell you that there is a powerful feature in Microsoft Fabric that can make this mantra about “working smart” a dream come true?
Semantic Link is a brand-new feature introduced with Fabric. In this demo-packed session, we’ll explain what Semantic Link is and how it works behind the scenes. Then, you should fasten your seatbelts because I’ll show you 10+ cool things you can do with Semantic Link in real life! For example, how to optimize your Power BI semantic model with a single line of code. Or, how to resolve model translation challenges in an easy and convenient way. Need to migrate existing import and DirectQuery models to DirectLake? Piece of cake with Semantic Link!
After we’re done, you’ll have a better understanding of the Semantic Link, and how this feature can enable you to work smarter and not harder!