Breakout sessions

Tuesday, June 3rd

You can experience the following sessions. Delivered by international experts from both Microsoft and the field.

- Christian Henrik Reich
An Apache Spark query’s journey through the layers of Microsoft Fabric

Headline: An Apache Spark Query’s Journey Through the Layers of Microsoft Fabric

Abstract: Join us for an exciting deep dive into the heart of Apache Spark! We’ll take you on a journey to see exactly how your Spark queries get executed, both within Apache Spark itself and through the different layers of Microsoft Fabric. Here’s what we’ll explore together:

* Spark SQL and Catalyst: A break down how Spark SQL works hand-in-hand with the Catalyst optimizer to make your queries smarter and faster.

* A Note on Tungsten: Discover how Tungsten boosts Spark’s performance with better memory management and lightning-fast execution.

* A note on Fabrics native execution engine: Bringing the power of C++, for even faster query execution.

*Delta Lake: See how Delta Lake makes your data lakes more reliable and scalable, ensuring your data is always in top shape.

*Parquet Files: Learn why Parquet’s columnar storage is a game-changer for efficient data storage and quick retrieval.

We’ll look into the official Apache Spark source code on GitHub, giving you a real, hands-on look at what’s happening under the hood.

By the end of this session, you’ll have a clearer understanding of how your queries run and some tools and tips to help you solve problems and optimize your Spark jobs for both speed and cost.

Auditorium A

Tue 9:45 am – 10:30 am
- Databricks
- Delta Lake
- Fabric Data Engineering
- Spark Notebooks
- Other
- Expert
Session presentation

A query’s Journey through Microsoft Fabric.pdf
- Nikola Ilic
Work smarter, not harder! 10 cool things Semantic Link can do for you in Microsoft Fabric

How many times have you heard: work smarter, not harder! But it’s easier said than done, right? What if I tell you that there is a powerful feature in Microsoft Fabric that can make this mantra about “working smart” a dream come true?

Semantic Link is a brand-new feature introduced with Fabric. In this demo-packed session, we’ll explain what Semantic Link is and how it works behind the scenes. Then, you should fasten your seatbelts because I’ll show you 10+ cool things you can do with Semantic Link in real life! For example, how to optimize your Power BI semantic model with a single line of code. Or, how to resolve model translation challenges in an easy and convenient way. Need to migrate existing import and DirectQuery models to DirectLake? Piece of cake with Semantic Link!

After we’re done, you’ll have a better understanding of the Semantic Link, and how this feature can enable you to work smarter and not harder!

Room C

Tue 9:45 am – 10:30 am
- Fabric Data Science
- Power BI
- Spark Notebooks
- Intermediate
Session presentation

Link to download the notebook.txt
- Vengatesh Parasuraman
Fabric Security – Everything you need to know

Security is a top priority for Microsoft Fabric. As a Fabric customer, you need to safeguard your assets from threats and follow your organization’s security policies. The Microsoft Fabric security session serves as an end-to-end security overview for Fabric. It covers details on how Microsoft secures your data by default as a software as a service (SaaS) service, and how you can secure, manage, and govern your data when using Fabric. This will be an engaging session and you will get an opportunity to provide direct feedbacks to the product team. If you are a data analytics professional implementing Fabric or if you are an IT/InfoSec admin who wants to make sure that proper controls are in place for your organizational tools, then don’t miss this session.

Room D

Tue 9:45 am – 10:30 am
- Fabric Data Engineering
- Advanced
Session presentation

Fabric Security Everything You Need to Know External.pdf
- Benni De Jagere
Fabric Capacities, beyond the obvious

You’re familiar with Fabric Capacities and how some of the core concepts like Bursting, Smoothing, and Throttling are different from what you’ve seen in other products and solutions? Great! Yet with the knowledge of these concepts, things sometimes still spiral out of control? Uh Oh ..

While the core concepts of Fabric Capacities bring a lot of benefits, they do pose some risks that need to be kept in check when figuring out the appropriate Capacity Planning and Management Strategy for your environment. During this session, we’ll walk through some of the key strategies that we’ve seen out there in the wild with some of the largest and complex customers using Power BI and Microsoft Fabric.

Making sure your plan is solid is an excellent first step, but after that execution is key. What are some of the best practices out there to stay on top of your capacity health and usage, and how do you respond when push comes to shove?

Pausing or scaling the capacities, or moving things around are great, but it requires a bit of thought to get it right. Or maybe it’s even worth looking into options to offload the consumption of certain workload types elsewhere?

Walking out of this session you’ll have picked up a few new things on Capacity Management and Monitoring strategies, making you feel more confident to put it out to the masses in your organization. Go forth, and Fabricate!

NOTE: This session is intended as a deep-dive, and has an assumption that you understand the core concepts of bursting, smoothing, and throttling in Fabric Capacities.

Auditorium A

Tue 10:45 am – 11:30 am
- Other
- Expert
Session presentation

Fabric Capacities – Beyond the obvious.pdf
- Tomasz Kostyrka
A quick journey through optimization techniques. Told differently than usual.

In this session, we will walk through the main optimization techniques – starting with classic indexes (B-Tree) for relational databases, via Z-Order, and Liquid Clustering for Lakehouses, and ending with the V-Order mechanism, recently introduced by Microsoft.

We will delve into the mathematical foundations behind the mechanisms to fill in some gaps and mention concepts that are often overlooked when presenting these techniques. But don’t be scared, We’ll introduce this theoretical knowledge in a very accessible way!

We’ll cover sorting, partitioning, the origin of the Z-Order curve, and many others. We’ll also break down the Parquet file into its components to fully understand how different pushdown mechanisms work.

We will talk about optimization techniques that you already know, but we’ll do it in a way different than usual ;).

Room C

Tue 10:45 am – 11:30 am
- Databricks
- Delta Lake
- Fabric Data Engineering
- Advanced
Session presentation

20250602_DPNextStep_v2.pdf
- Grace O’Halloran
Another Brick in the Firewall: How to Secure your Azure Data Platform

These days, most people want their Azure Data Platforms to be deployed in a secure a network topology. As Data Engineers, we are often the ones that have the make this happen. The cloud has made it easy for us to deploy a virtual network here and a private endpoint there, but what does a good, networked data platform actually look like, and how does it work? Simple things become complex: how will my ADF Integration Runtime talk to my data sources? How do I securely access my resources to do development?

In this session we will look at some of the core network components which can be used to secure your data platform; what they are, and how to use them effectively. We will also look at some of the decisions that need to be made when moving your data platform inside a private network, which weren’t a consideration previously. Some basic knowledge of what a virtual network is is required.

By the end of this session you should feel more confident in working with network components in Azure and how you can use them to secure your Data Platform.

Room D

Tue 10:45 am – 11:30 am
- Azure DevOps
- Databricks
- Data Factory
- Infrastructure as Code
- Advanced
Session presentation

Another Brick in the Firewall.pdf
- Olivier Van Steenlandt
Apache Spark for SQL Data Warehouse Developers

Are you wondering how to get started with Apache Spark? Are you currently working with SQL Server or Azure SQL (MI)? Or, are you a Data Warehouse developer?

Apache Spark gained popularity in the past few years and became the new cool kid on the block. But what is Apache Spark and how can we leverage its capabilities? Which skills do we need? During this session, I will show you how we can use Apache Spark starting from our existing SQL skillset.

In this session, we will start with a brief introduction to Apache Spark and we will learn how we can use our SQL knowledge within Apache Spark.

After a short introduction, we will be focussing on practical examples and we will be comparing how we will solve challenges in SQL, and what the alternatives are in Apache Spark. Throughout the session, we will make the examples more elaborate and complex as we go.

You will learn how you can apply your SQL knowledge in Apache Spark.

By the end of this session, you will have a solid understanding of how you can use your SQL skills to solve challenges with Apache Spark.

Auditorium A

Tue 12:30 pm – 1:15 pm
- Spark Notebooks
- Intermediate
Session presentation

Apache Spark for SQL Data Warehouse Developers.pdf
- Falek Miah
- Zach Stagers
Simplifying Code Distribution with Databricks Asset Bundles

Sharing code effectively is key to building scalable and maintainable data solutions. Whether you’re deploying Python libraries or moving workflows across environments, efficient code distribution ensures consistency, reduces errors, and streamlines collaboration.

In this session, we’ll explore two powerful ways to package and distribute code: Python Wheels and Databricks Asset Bundles (DABs). You’ll learn how Python Wheels enable faster, more reliable sharing of Python code and how Databricks Asset Bundles allow you to package entire projects, including scripts, workflows, and Delta Live Tables. We’ll also cover the key differences between these approaches and when to use each.

By the end, you’ll have a clear understanding of how to distribute code effectively in Databricks. You’ll gain practical knowledge of Python Wheels for efficient package distribution and Databricks Asset Bundles for managing full-scale projects, helping you simplify development and deployment.

Room C

Tue 12:30 pm – 1:15 pm
- Databricks
- Intermediate
Session presentation

Code Distribution with DABs – DataPlatformNextSteps2025.pdf
- Erin Howland
Grit and Growth: Stories from the Trenches

In this session, I’ll share the invaluable lessons I’ve learned from navigating some of the most challenging and uncomfortable work and business-related situations. From mastering the art of client interviews to spotting red flags early on, I’ll provide practical insights and strategies that can help you avoid common pitfalls. We’ll delve into the critical items to specify in contracts to protect your interests and ensure clarity in business dealings. Understanding your worth is another key theme we’ll explore, discussing how to confidently communicate your value and negotiate effectively. Join me for an honest and insightful look at the hard-earned wisdom that can help you thrive in your professional journey.

Room D

Tue 12:30 pm – 1:15 pm
- Other
- Intermediate
Session presentation

Grit and Growth.pdf
- Luke Moloney
Fabric Spark at scale – tips, tricks and best practices

In this demo-centric session we will run through the tips, tricks and best practices when using Spark at scale.

In this session we will cover:
– Different ways to configure and manage your Spark Environments, including cluster sizing, libraries and configuration properties
– Tips for performance profiling and optimisation including when Delta optimisations in Fabric might be causing performance issues
– Different options for complex orchestration patterns that minimize cluster start-up time including use Airflow in Fabric
– Using Notebookutils to the fullest to orchestrate end-to-end scenarios
– Ways to use an Eventhouse to monitor your Spark Jobs and find performance regression.

This session is targeted at people who are either:
1 – looking to start a Fabric project in the near term with extensive use of Spark and Lakehouses; or
2 – those current using Fabric Spark who are wanting to take their skills to the next level.

Auditorium A

Tue 1:30 pm – 2:15 pm
- Data Factory
- Delta Lake
- Fabric Data Engineering
- Spark Notebooks
- Advanced
Session presentation

DPNS-SparkScale.pdf
- Brian Bønk
Another query language — do we really need KQL?

As data professionals, we often ask ourselves: Why yet another new coding language. In the release of Fabric the KQL (Kusto Query Language) was also a part of the need to have a full implementation capability.

When we already have Python, PySpark, Scala and T-SQL? What problems does KQL solve, and when is it the right tool for the job vs. T-SQL?

In the world of data, choosing the right tool for the job can be the key to success. Both languages are powerful, and they are designed for different purposes and platforms. Understanding their strengths, differences, and ideal use cases can make or break your project when working with diverse data ecosystems.

Through a comparative discussion, we’ll give you the knowledge to know the strengths of each.

The foundational differences between T-SQL and KQL: syntax, execution, and purpose.
Ideal scenarios for using T-SQL versus KQL.
Key features like joins, aggregations, and data transformations, and how they are implemented in each language.
Practical use cases, including transitioning between the two when working in hybrid systems.
Tips and tricks on how you can use your T-SQL skills in the KQL world.

Whether you’re a T-SQL professional curious about the capabilities of KQL, or a KQL enthusiast looking to expand your database coding skills, this session will provide valuable insights to bridge the gap between these two powerful languages.

Let’s explore the best of both worlds and equip you with the knowledge of choosing the right tool for the right job in your projects.

Room C

Tue 1:30 pm – 2:15 pm
- Fabric Data Engineering
- Real-Time Intelligence
- Intermediate
- Aaron Merrill
Secure data end-to-end with Microsoft Fabric and OneLake

Elevate your knowledge of data security in Microsoft Fabric with this in-depth look at the security and governance features available. In this session, we’ll start by providing an overview of the security capabilities within Microsoft Fabric. We’ll look at workspace, item, and fine-grained security features and how they layer together. Next, the session will explore the different engines within Microsoft Fabric and how they each bring their own security features and characteristics. This session will also answer questions around data mirroring, how to secure OneLake shortcuts, and many other important pieces of security info. Make sure to attend this look at data security in Microsoft Fabric!

Room D

Tue 1:30 pm – 2:15 pm
- Fabric Data Engineering
- Fabric Data Science
- Fabric Data Warehouse
- Intermediate
Session presentation

Secure data end-to-end with Microsoft Fabric and OneLake.pdf
- Filip Popović
- Srdjan Matin
Enhancing the Developer Experience in Microsoft Fabric Warehouse through Functions

In the fast-evolving landscape of data analytics, enhancing the developer experience is crucial for achieving seamless and efficient workflows within data warehouses. This presentation will explore the transformative potential of using functions in Microsoft Fabric Warehouse to elevate developer productivity and satisfaction.

We’ll delve into the power of native SQL, AI and Fabric functions through demos. By extending capabilities of TSQL or encapsulating business logic within these functions, developers can streamline collaboration and foster innovation by sharing generalized solutions across teams.

Join us as we uncover the world of functions, discussing both their benefits and trade-offs.

Auditorium A

Tue 2:45 pm – 3:30 pm
- Fabric Data Warehouse
- Advanced
Session presentation

DataPlatformNextSteps-2025-Functions.pdf
- Michael Johnson
Delta Merge, the data engineer`s best friend

The ‘UPSERT pattern’, where a set of data changes is combined with existing data, is a pattern commonly used in data engineering. The UPSERT pattern allows you, the data engineer, to merge INSERTS, UPDATES and DELETES. Often it is only possible to perform these steps as separate operations which can be both time-consuming and error prone.

Delta Merge was added to Delta Lake to simplify the UPSERT process for data engineers, streamlining the process into a single command that handles the inserts, updates and deletes as a single operation.

During this session you will learn how you can use the PySpark or SparkSQL to seamlessly merge change data sets efficiently to implement common data modeling techniques such as Type 1 or Type 2 dimensions or soft deletes all or which are commonly used in data warehousing scenarios.

Join us to find out how the Delta Merge statement can really become the data engineer’s best friend, saving you time to focus on what matters.

Room C

Tue 2:45 pm – 3:30 pm
- Databricks
- Delta Lake
- Fabric Data Engineering
- Advanced
Session presentation

Delta Merge – 2025-06-03 – Data Platform Next steps.pdf
- Jacqueline Lee
From Chaos to Clarity: Enabling Data teams with Observability

Data platform teams play a critical role in enabling other teams to drive business value. However, understanding how internal users interact with data systems often feels like solving a mystery. While traditional observability focuses on ensuring data integrity and system reliability, user observability opens up a new dimension: understanding who is using your data, how they’re using it, and where friction exists.

In this talk, we’ll explore how user observability impacts our approach to platform engineering at Yelp. We’ll discuss how real-time insights into user behavior helped us uncover hidden dependencies, diagnose incidents faster, and prioritize improvements that truly matter. You’ll hear stories of reducing friction in data access, enabling self-service, and using observability to inform the next generation of platform development.

This session is for platform engineers who want to empower their data teams with actionable insights, improve cross-team collaboration, and design platforms that deliver measurable value. You’ll leave with a fresh perspective on observability and a deeper understanding of how to align platform capabilities with the needs of the teams you enable.

Room D

Tue 2:45 pm – 3:30 pm
- Other
- Intermediate
Session presentation

Chaos_to_clarity.pptx
- Gerhard Brueckl
Showcasing Fabric Studio

In this session I will give you an introduction to the Visual Studio Code extension “Fabric Studio” which allows you to manage your Fabric environment directly from within VSCode. Leveraging the Fabric REST APIs you can easily browse through Fabric items and run various tasks. For not so common API calls there is also the ability to user VSCode notebooks offering intelli-sense and auto-complete from all existing API calls. It further allows you to modify existing items like semantic models, notebooks, pipelines, etc. in the VSCode IDE and publish your changes back to the Fabric service. It also features a OneLake browser to inspect the output of your operations.

Auditorium A

Tue 3:45 pm – 4:30 pm
- Fabric Data Engineering
- Fabric Data Science
- Power BI
- Intermediate
Session presentation

Gerhard Brueckl Showcasing Fabric Studio.pdf
- Erwin de Kreuk
The Crucial Role of Data Quality in Your Data Estate

In today’s data-driven landscape, the quality of your data directly impacts the accuracy of AI-driven insights and decision-making. Here’s why data quality matters:

– Trustworthy Insights: Reliable data ensures that AI models generate accurate predictions and recommendations. Without trustworthy data, there’s a risk of eroding trust in AI systems.
– Business Processes and Decision-Making: Poor data quality or incompatible data structures can hinder business processes and decision-making capabilities. Clean, well-structured data is essential for informed choices.

A powerful data platform plays a crucial role in maintaining high data quality. With a robust data platform, you can ensure that your data is consistent, accurate, and readily available for various applications and processes. Leveraging such a platform is foundational for implementing effective data quality measures.

During the session, we will guide you through Microsoft Purview Data Quality:
This comprehensive solution empowers business domain and data owners to assess and oversee data quality. It offers no-code/low-code rules, including out-of-the-box (OOB) and AI-generated rules.

Purview Data Quality incorporates AI-powered data profiling. It recommends columns for profiling, allowing human intervention to refine these recommendations. This iterative process enhances accuracy and improves underlying AI models.

By integrating Microsoft Purview with your data platform, you can apply and monitor data quality processes more effectively. This integration ensures seamless data management and governance across your data estate.

During the session, we will walk you through the Data Quality Life Cycle:

– Assign data quality steward permissions in your data catalog.
– Register and scan data sources in Microsoft Purview Data Map.
– Set up data source connections for quality assessment.
– Configure and run data profiling.
– Define and apply data quality rules.

Room C

Tue 3:45 pm – 4:30 pm
- Purview
- Intermediate
Session presentation

ErwindeKreuk-The Crucial Role of Data Quality in Your Data Estate.pdf
- Lisa Hoving
Help, my Azure Databricks is too expensive! Some tips and tricks.

Azure Databricks is expensive. Running a cluster can cost thousands, if not 10s of thousands of Euros per month. Therefore, Databricks is only suitable for the biggest of datasets. Seems to be common knowledge… right?

Let’s be honest, it doesn’t need to be this way. With some tips and tricks, Azure Databricks can be suitable for processing any kind of dataset. Without breaking the bank.

Not convinced? During this talk, I will show you how. Together, we follow several scenarios that unnecessarily increase your monthly spend. By understanding what these scenarios are, and how to solve them, we will get a grip on that Azure bill.

By the end of this talk, you will be able to:
– Understand the concept of “DBU”
– Create an Azure Databricks cluster in a cost-effective manner
– Stream data and run batch jobs in a way that doesn’t break the bank
– Use the Azure cost-calculator effectively
– Set budgets and alerts on your Azure subscriptions

Room D

Tue 3:45 pm – 4:30 pm
- Databricks
- Intermediate
Session presentation

Help my databricks is too expensive – Lisa Hoving.pdf

Breakout sessions

Tuesday, June 3rd

An Apache Spark query’s journey through the layers of Microsoft Fabric

Work smarter, not harder! 10 cool things Semantic Link can do for you in Microsoft Fabric

Fabric Security – Everything you need to know

Fabric Capacities, beyond the obvious

A quick journey through optimization techniques. Told differently than usual.

Another Brick in the Firewall: How to Secure your Azure Data Platform

Apache Spark for SQL Data Warehouse Developers

Simplifying Code Distribution with Databricks Asset Bundles

Grit and Growth: Stories from the Trenches

Fabric Spark at scale – tips, tricks and best practices

Another query language — do we really need KQL?

Secure data end-to-end with Microsoft Fabric and OneLake

Enhancing the Developer Experience in Microsoft Fabric Warehouse through Functions

Delta Merge, the data engineer`s best friend

From Chaos to Clarity: Enabling Data teams with Observability

Showcasing Fabric Studio

The Crucial Role of Data Quality in Your Data Estate

Help, my Azure Databricks is too expensive! Some tips and tricks.