Breakout sessions
Friday, June 9th
You can experience the following sessions. Delivered by international experts from both Microsoft and the field.
-
Azure DevOps for Data Engineers: the Git way
If you want to setup Azure DevOps the way that works best with Git, what would it be?
Usually, “One repository per independently deployable product” is stated. What is one product? As Data Platform people, many of us are used to thinking of our (former?) Data Warehouse as a black box. But that’s not what meant here!
We’ll take inspiration from our software development peers and look how to break up our data platform projects in multiple smaller repositories and accompanying pipelines.
We’ll look into the practice of breaking up your data project into multiple small repositories and accompanying pipelines. How does that work with schema changes? And how do we orchestrate everything in a neat way? Are there other best practices to keep in mind? -
Azure Synapse Analytics: Networking for Production
Azure Synapse Analytics is a wonderful collection of capabilities to bring your data to life, but making sense of how to secure the networtking for Synapse Analytics can be challenging. We need to understand concepts like managed vnets and private endpoints, as well as know about Azure DNS, network architecture and possibly even about connectivity methods from on-premises to make this work with more complex scenarios.
Join this session to understand networking for Azure Synapse Analytics and how you can find the correct level of network security for your scenario. We will start from an overview of Azure Synapse Analytics and understanding the different endpoints that make up the capabilities of Synapse Analytics. Then we will go from the most open networking configuration to securing everything to only traffic from your own networks. With this we will finish with a walk through of the details of a common scenario of having secure access to your Synapse Analytics workspace from an on-premises site. -
Building a Data Sharing Lakehouse with Unity Catalog
You’ve written some Pyspark, loaded data into a lake and built some lovely data models…now what? How do you open that data up to your analytics community? How do you build a secure, but easy to use platform?
With Databricks, we now have a governance platform for securing, documenting & presenting data to many different use cases. In this session we’ll dive into how this changes our Lakehouse patterns, how to get started with Unity Catalog and show some of the new features!
Some familiarity with lakes & Databricks will help! -
dbt with Azure Synapse
dbt is the new data transformation tool taking the world by storm. It lowers the barrier of entry into the world of data analytics to everyone who ever wrote a line of SQL. Did you know it integrates quite well with Azure Synapse? Join this session to follow in the footsteps of thousands of analytics engineers and fall in love with dbt. Learn more about how dbt works with Azure SQL and Azure Synapse from one of the maintainers of the official dbt adapter! We’ll use Azure Synapse and VS Code to build our first Hello Azure project.
-
Designing Data Architectures that InfoSec will actually approve
Building your data platform in the cloud is easy, but as soon as that dreaded word “security” becomes involved it suddenly becomes incredibly painful. How do you go about integrating it with your existing networking, how do you manage user security, what on earth is a private endpoint? Over the past year, a lot of these tools have evolved and we now have a set of mature patterns we can apply to actually make a modern data platform secure.
In this session I’ll guide you from through a secure reference architecture with Data Factory, Databricks, Data Lake, and Azure Synapse, working together as a secure, fully productionised platform. Each has their own idiosyncrasies, but this session will teach you the options available and the pitfalls to avoid. -
Driving alerts and actions on your data
The content for this session is not yet available.
-
Implementing Azure Data Integration Pipelines in Production
Within a typical Azure data platform solution for any enterprise grade data analytics or data science workload an umbrella resource is needed to trigger, monitor, and handle the control flow for transforming datasets. Those requirements are met by deploying Azure Data Integration pipelines, delivered using Synapse Analytics or Data Factory. In this session I’ll show you how to create rich dynamic data pipelines and apply these orchestration resources in production. Using scaled architecture design patterns, best practice and the latest metadata driven frameworks. In this session we will take a deeper dive into the service, considering how to build custom activities, dynamic pipelines and think about hierarchical design patterns for enterprise grade deployments. All this and more in a series of short stories (based on real world experience) I will take you through how to implement data integration pipelines in production.
-
Long Live Star Schema! Collaboratively design your dimensional model with business users
Despite claims to the contrary, dimensional modelling and star schemas are alive and well the in the modern data world. But whilst developers might have great technical skills and understand how to build a star schema, they may lack the business domain knowledge to ensure that what they deliver is fit for use by analysts and self-service users. On the flip side, these end users often know what they want and need from a data platform, but struggle to explain this in a way that makes it easy for developers to implement. How can we improve the requirements gathering process to make sure we avoid the tensions that can arise from this? Enter “SunBeam”. SunBeam is a technique developed by Advancing Analytics that looks to bridge the gap between business and IT by using an end-to-end process for working with business users to collaboratively design a star schema. This session is an introduction to this technique.
-
Modelling and indexing your data warehouse
At some point, when working with data, the star schema pops up. There is a lot of misconception about the star schema, but realising it is designed for our data technologies and our data technologies is optimised for it, it becomes a very powerfull pattern. This session is deep technical and about designing star schemas and indexing them correctly, and how it roots in our technologies. The end result is that attendee can build a data warehouse for less money, and having a self-service platform like Power BI holding more data compared to using other patterns.
-
Protect your data from tampering with Ledger in SQL
Establishing trust around the integrity of data stored in database systems has been a longstanding problem for all organizations that manage financial, medical, or other sensitive data. Ledger is a new feature in Azure SQL and SQL Server that incorporates blockchain crypto technologies into the RDBMS to ensure the data stored in a database is tamper evident. The feature introduces ledger tables that make data tampering easy to detect.
In this session, we will cover
• The basic concepts of Ledger and how it works
• Ledger Tables
• Digest management and database verification
After completing this session, you will understand how SQL Ledger works and how it can help to protect sensitive data. -
Solve your Data Governance challenges with Microsoft Purview
What data do I have? Where did the data come from? Can I trust it? How do I manage access and control?
These are questions that a Chief Data Officer wants to have answers on when analyzing an organization’s Data Estate.
Data consumer, data producers and the security administrator all have their own challenges. Microsoft Purview is designed to address these challenges.
Microsoft Purview will help to understand assets across the entire data estate and provide easy access to all data, security and risk solutions.
In this session, we’ll take a closer look at Unified Data Governance, one of Microsoft Purview’s solutions and see if we have answers on the followings questions:
· What challenges do organizations and user groups face with Data Governance?
· How can Microsoft Purview contribute to this?
· How can we easily create a holistic, up-to-date map of our data landscape?
· How can we find valuable and reliable data?
· What are the costs for Microsoft Purview?
· What are the latest/new features available in Microsoft Purview
So if you’re a CDO, a data consumer, a data producer, or a security administrator, these sessions are definitely worth following. -
Spark Execution Plans for Databricks
Databricks is a powerful data analytics tool for data science and data engineering, but understanding how code is executed on cluster can be daunting.
Using Spark execution plans allows you to understand the execution process and flow, this is great for optimizing queries and identifying bottlenecks.
This session will introduce you to Spark execution plans, the execution flows and how to interrogate the different plans.
By the end of this session, you will have everything you need to start optimizing your queries. -
Synapse Analytics – how it works
Synapse is a powerful cloud-based analytics service offered by Microsoft Azure that can help organizations accelerate their data insights and simplify their data management. This session is designed to provide attendees with a comprehensive guide to Synapse, including its key features and capabilities.
The session will begin with an overview of Synapse and how it works. This will be followed by an in-depth look at its main components, such as Synapse Studio, Synapse Analytics, and Synapse Pipelines. We will explore how to use each of these components to design, develop, and deploy big data solutions that can scale to meet the needs of any organization.
Additionally, we will dive into various Synapse use cases, such as data warehousing, data lakes, and machine learning, and discuss how Synapse can be used to solve real-world business problems. Attendees will also learn best practices for working with Synapse, such as optimizing performance, security, and monitoring.
By the end of this session, attendees will have a solid understanding of Synapse and how it can be used to drive business value. They will leave with practical knowledge on how to implement Synapse in their own organizations, and how to leverage its full potential to achieve their data analytics goals. -
Synapse Espresso Lungo
Synapse Espresso team is coming to your favorite conference with freshly brewed pot of content!
Join us in this hour-long session to learn how to unlock the power of Synapse SQL pools and boost your data warehouse and data lake performance.
We will cover everything you need to know to delight your business users by making serverless SQL pool fly! From basics to squeezing the very last bit of performance out of it.
If you are more into data warehouses, don’t worry, we have you covered as well! We will cover best strategy for ingestion and consumption of your data so you can start working on your data warehouse quickly and with confidence.
Every tip and trick we share is proven in real world customer scenarios.
And all of this comes in easy to consume sips! -
Synapse for the entire data department
Synapse and is not only very user-friendly and easy to setup, it is also a great tool for working across different data roles.
Synapse improves cross-functional collaboration, by breaking down barriers, creating understanding and collaboration between data scientists, data engineers, business analyst, database managers and the IT department. In part because they can see each other’s work and still work in their own preferred language or tool, and when setting up Azure Synapse correctly with Azure DevOps and git, it also becomes easier to create cross-functional code repositories and good versioning.
1. Data scientists can work with notebooks, R, Scala, or Python in notebooks and create data pipelines with notebooks, in addition, they can use parts of the machine learning studio directly from Azure synapse.
2. Data engineers have all the best from SQL data warehouse, data factory and data lakes in one place.
3. Business analysts can develop Power BI reports from with Synapse, run SQL queries and get and overview of the data pipelines, seeing where data comes from and what transformations have been done.
4. Database administrators can get their full overview of usage and easily administrate access and security
5. The IT department can fully monitor, spend, use and security, and integrate other relevant tools. -
Synapse Serverless SQL Pools and Power BI. A match made in heaven?!
As a Data Engineer, Data Scientist or Data Visualisation Artist in an organisation, you thrive on creating solutions and architectures that are engineered well and deliver the insights people need at the right time, to the right place, and in the right format. This is our motto, and we aim strive by that.
But have you ever had that occurrence where a report (and underlying data) had to be delivered fast, and you felt there was no room for those fundamental design decisions? A quick delivery is better than no delivery, right? Let’s assume life is good, and you’re about to settle down with a cup of tea to work on some other backlog items. Then, you notice some of those quick and dirty decisions will come back to haunt you, taking you more time to rectify as there are now moving parts in the process. Is the higher management really going to accept that report that takes minutes to load for their mission critical decisions, that may end up costing them even more? Most likely, the answer will be no, and we have to do something about it ..
This session will focus on the business scenario of a Logical Data Warehouse approach using Azure Synapse Serverless SQL Pools, and Power BI for those lightning quick insights into our data. Starting off with a brief introduction of Azure Synapse Serverless SQL Pools, we will move on to a problematic situation and work our way down a troubleshooting path to gradually improve the performance on all cogs in the chain. We begin the scenario from simple CSV files, and continue by leveraging Parquet Files, improved usage of Data Types, Partitioning and Data Elimination. Then to finish off by using techniques on the Power BI side like Hybrid Tables and Auto Aggregations to give it that extra bit of juice. With every step taken, we’ll assess the improvements and considerations, and why we think this was a good idea.
If you are interested in an action packed and unique collaboration between the Azure Fastrack and Power BI Customer Advisory Teams, you are in for a treat. They will bring their experiences from the trenches from working with challenging situations at customers to make the most out of this common scenario. Above all, you’ll walk away with practical ideas, techniques and insights for you to try out, and improve your own scenarios. -
Using Synapse to combine data from D365FO with other data for reporting
Showing how to bring data from the Dynamics 365 FO into Synapse Analytics. A showcase based on real experience with the Common Data Model (CDM) and how to setup the dataflow using the functionapp CDMUtil and Synapse Analytics (dedicated SQL Pool) to automate the extract of data using Synapse pipelines.
D365 FO is loaded with data, but combining these data with other data from eg. fabric sensors and bringing those data into decisions through the Power BI has been a struggle. But Microsoft has introduced the Extract to Datalake tool in D365, where selected data is copied to the data lake in a CDM “formate”. To automatically interpret the files in the data lake and load those data into a Synapse Analytics SQL Pool, we use the CDMUtil functionapp and a Synapse Pipeline where we can join with other data from the business, and utilize the power of the Synapse engine and SQL operations, to add value to the final reporting.
At the end of the session the audience will have a good understanding of how to bring data from the Dynamics 365FO into Synapse Analytics for further processing.