Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Workshop: Kubernetes - From Bare Metal to SQL Server Big Data Clusters

A Microsoft Course from the SQL Server team

About this Workshop

Welcome to this Microsoft solutions workshop on Kubernetes - From Bare Metal to SQL Server Big Data Clusters. In this workshop, you'll learn about setting up a production-grade SQL Server 2019 big data cluster environment on Kubernetes. Topics covered include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of Jupyter Notebooks in Microsoft's Azure Data Studio tool to run T-SQL, Spark, and Machine Learning workloads on the cluster. You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server big data clusters.

The focus of this workshop is to understand the hardware, software, and environment you need to work with SQL Server 2019's big data clusters on a Kubernetes platform.

You'll start by understanding Containers and Kubernetes, moving on to a discussion of the hardware and software environment for Kubernetes, and then to more in-depth Kubernetes concepts. You'll follow-on with the SQL Server 2019 big data clusters architecture, and then how to use the entire system in a practical application, all with a focus on how to extrapolate what you have learned to create other solutions for your organization.

NOTE: This course is designed to be taught in-person with hardware or virtual environments provided by the instructional team. You will also get details for setting up your own hardware, virtual or Cloud environments for Kubernetes for a workshop backup or if you are not attending in-person.

This github README.MD file explains how the workshop is laid out, what you will learn, and the technologies you will use in this solution. To download this Lab to your local computer, click the Clone or Download button you see at the top right side of this page. More about that process is here.

(You can view all of the source files for this workshop on this github site, along with other workshops as well. Open this link in a new tab to find out more.)

Learning Objectives

In this workshop you'll learn:

  • How Containers and Kubernetes work and when and where you can use them
  • Hardware considerations for setting up a production Kubernetes Cluster on-premises
  • Considerations for Virtual and Cloud-based environments for production Kubernetes Cluster

The concepts and skills taught in this workshop form the starting points for:

Solution Architects, to understand how to design an end-to-end solution. System Administrators, Database Administrators, or Data Engineers, to understand how to put together an end-to-end solution.

Business Applications of this Workshop

Businesses require stable, secure environments at scale, which work in secure on-premises and in-cloud configurations. Using Kubernetes and Containers allows for manifest-driven DevOps practices, which further streamline IT processes.

Technologies used in this Workshop

The solution includes the following technologies - although you are not limited to these, they form the basis of the workshop. At the end of the workshop you will learn how to extrapolate these components into other solutions. You will cover these at an overview level, with references to much deeper training provided.

Technology Description
LinuxThe primary operating system used in and by Containers and Kubernetes
ContainersThe atomic layer of a Kubernetes Cluster
KubernetesThe primary clustering technology for manifest-driven environments
SQL Server Big Data ClustersRelational and non-relational data at scale with Spark, HDFS and application deployment capabilities

Before Taking this Workshop

There are a few requirements for attending the workshop, listed below:

  • You'll need a local system that you are able to install software on. The workshop demonstrations use Microsoft Windows as an operating system and all examples use Windows for the workshop. Optionally, you can use a Microsoft Azure Virtual Machine (VM) to install the software on and work with the solution.
  • You must have a Microsoft Azure account with the ability to create assets for the "backup" or self-taught path.
  • This workshop expects that you understand computer technologies, networking, the basics of SQL Server, HDFS, Spark, and general use of Hypervisors.
  • The Setup section below explains the steps you should take prior to coming to the workshop

If you are new to any of these, here are a few references you can complete prior to class:


A full pre-requisites document is located here. These instructions should be completed before the workshop starts, since you will not have time to cover these in class. Remember to turn off any Virtual Machines from the Azure Portal when not taking the class so that you do incur charges (shutting down the machine in the VM itself is not sufficient).

Workshop Details

This workshop uses Kubernetes to deploy a workload, with a focus on Microsoft SQL Server's big data clusters deployment for advanced analytics over large sets of data and Data Science workloads.

Primary Audience:Technical processionals tasked with configuring, deploying and managing large-scale clustering systems
Secondary Audience: Data professionals tasked with working with data at scale
Level: 300
Type: In-Person (self-guided possible)
Length: 8

Related Workshops

Workshop Modules

This is a modular workshop, and in each section, you'll learn concepts, technologies and processes to help you complete the solution.

01 - An introduction to Linux, Containers and Kubernetes This module covers Container technologies and how they are different than Virtual Machines. You'll learn about the need for container orchestration using Kubernetes.
02 - Hardware and Virtualization environment for Kubernetes This module explains how to make a production-grade environment using "bare metal" computer hardware or with a virtualized platform, and most importantly the storage hardware aspects.
03 - Kubernetes Concepts and Implementation Covers deploying Kubernetes, Kubernetes contexts, cluster troubleshooting and management, services: load balancing versus node ports, understanding storage from a Kubernetes perspective and making your cluster secure.
04 - SQL Server Big Data Clusters Architecture This module will dig deep into the anatomy of a big data cluster by covering topics that include: the data pool, storage pool, compute pool and cluster control plane, active directory integration, development versus production configurations and the tools required for deploying and managing a big data cluster.
05 - Using the SQL Server big data cluster on Kubernetes for Data Science Now that your big data cluster is up, it's ready for data science workloads. This Jupyter Notebook and Azure Data Studio based module will cover the use of python and PySpark, T-SQL and the execution of Spark and Machine Learning workloads.

Next Steps

Next, Continue to Pre-Requisites

Workshop Authors and Contributors

Legal Notice

Kubernetes and the Kubernetes logo are trademarks or registered trademarks of The Linux Foundation. in the United States and/or other countries. The Linux Foundation and other parties may also have trademark rights in other terms used herein. This Workshop is not certified, accredited, affiliated with, nor endorsed by Kubernetes or The Linux Foundation.


Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.

Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.

Privacy information can be found at https://privacy.microsoft.com/en-us/

Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.


Kubernetes Course for SQL Server Big Data Clusters


Code of conduct

Security policy





No releases published


No packages published