Workshop: Kubernetes - From Bare Metal to SQL Server Big Data Clusters
A Microsoft Course from the SQL Server team
Welcome to this Microsoft solutions workshop on Kubernetes - From Bare Metal to SQL Server Big Data Clusters. In this workshop, you'll learn about setting up a production-grade SQL Server 2019 big data cluster environment on Kubernetes. Topics covered include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of Jupyter Notebooks in Microsoft's Azure Data Studio tool to run T-SQL, Spark, and Machine Learning workloads on the cluster. You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server big data clusters.
The focus of this workshop is to understand the hardware, software, and environment you need to work with SQL Server 2019's big data clusters on a Kubernetes platform.
You'll start by understanding Containers and Kubernetes, moving on to a discussion of the hardware and software environment for Kubernetes, and then to more in-depth Kubernetes concepts. You'll follow-on with the SQL Server 2019 big data clusters architecture, and then how to use the entire system in a practical application, all with a focus on how to extrapolate what you have learned to create other solutions for your organization.
NOTE: This course is designed to be taught in-person with hardware or virtual environments provided by the instructional team. You will also get details for setting up your own hardware, virtual or Cloud environments for Kubernetes for a workshop backup or if you are not attending in-person.
This github README.MD file explains how the workshop is laid out, what you will learn, and the technologies you will use in this solution. To download this Lab to your local computer, click the Clone or Download button you see at the top right side of this page. More about that process is here.
In this workshop you'll learn:
- How Containers and Kubernetes work and when and where you can use them
- Hardware considerations for setting up a production Kubernetes Cluster on-premises
- Considerations for Virtual and Cloud-based environments for production Kubernetes Cluster
The concepts and skills taught in this workshop form the starting points for:
Solution Architects, to understand how to design an end-to-end solution. System Administrators, Database Administrators, or Data Engineers, to understand how to put together an end-to-end solution.
Businesses require stable, secure environments at scale, which work in secure on-premises and in-cloud configurations. Using Kubernetes and Containers allows for manifest-driven DevOps practices, which further streamline IT processes.
The solution includes the following technologies - although you are not limited to these, they form the basis of the workshop. At the end of the workshop you will learn how to extrapolate these components into other solutions. You will cover these at an overview level, with references to much deeper training provided.
|Linux||The primary operating system used in and by Containers and Kubernetes|
|Containers||The atomic layer of a Kubernetes Cluster|
|Kubernetes||The primary clustering technology for manifest-driven environments|
|SQL Server Big Data Clusters||Relational and non-relational data at scale with Spark, HDFS and application deployment capabilities|
There are a few requirements for attending the workshop, listed below:
- You'll need a local system that you are able to install software on. The workshop demonstrations use Microsoft Windows as an operating system and all examples use Windows for the workshop. Optionally, you can use a Microsoft Azure Virtual Machine (VM) to install the software on and work with the solution.
- You must have a Microsoft Azure account with the ability to create assets for the "backup" or self-taught path.
- This workshop expects that you understand computer technologies, networking, the basics of SQL Server, HDFS, Spark, and general use of Hypervisors.
- The Setup section below explains the steps you should take prior to coming to the workshop
If you are new to any of these, here are a few references you can complete prior to class:
- Microsoft SQL Server Administration and Use
- Hypervisor Technologies - Hyper-V or
- Hypervisor Technologies - VMWare
A full pre-requisites document is located here. These instructions should be completed before the workshop starts, since you will not have time to cover these in class. Remember to turn off any Virtual Machines from the Azure Portal when not taking the class so that you do incur charges (shutting down the machine in the VM itself is not sufficient).
This workshop uses Kubernetes to deploy a workload, with a focus on Microsoft SQL Server's big data clusters deployment for advanced analytics over large sets of data and Data Science workloads.
|Primary Audience:||Technical processionals tasked with configuring, deploying and managing large-scale clustering systems|
|Secondary Audience:||Data professionals tasked with working with data at scale|
|Type:||In-Person (self-guided possible)|
This is a modular workshop, and in each section, you'll learn concepts, technologies and processes to help you complete the solution.
|01 - An introduction to Linux, Containers and Kubernetes||This module covers Container technologies and how they are different than Virtual Machines. You'll learn about the need for container orchestration using Kubernetes.|
|02 - Hardware and Virtualization environment for Kubernetes||This module explains how to make a production-grade environment using "bare metal" computer hardware or with a virtualized platform, and most importantly the storage hardware aspects.|
|03 - Kubernetes Concepts and Implementation||Covers deploying Kubernetes, Kubernetes contexts, cluster troubleshooting and management, services: load balancing versus node ports, understanding storage from a Kubernetes perspective and making your cluster secure.|
|04 - SQL Server Big Data Clusters Architecture||This module will dig deep into the anatomy of a big data cluster by covering topics that include: the data pool, storage pool, compute pool and cluster control plane, active directory integration, development versus production configurations and the tools required for deploying and managing a big data cluster.|
|05 - Using the SQL Server big data cluster on Kubernetes for Data Science||Now that your big data cluster is up, it's ready for data science workloads. This Jupyter Notebook and Azure Data Studio based module will cover the use of python and PySpark, T-SQL and the execution of Spark and Machine Learning workloads.|
Next, Continue to Pre-Requisites
Workshop Authors and Contributors
Kubernetes and the Kubernetes logo are trademarks or registered trademarks of The Linux Foundation. in the United States and/or other countries. The Linux Foundation and other parties may also have trademark rights in other terms used herein. This Workshop is not certified, accredited, affiliated with, nor endorsed by Kubernetes or The Linux Foundation.
Microsoft and any contributors grant you a license to the Microsoft documentation and other content in this repository under the Creative Commons Attribution 4.0 International Public License, see the LICENSE file, and grant you a license to any code in the repository under the MIT License, see the LICENSE-CODE file.
Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.
Privacy information can be found at https://privacy.microsoft.com/en-us/
Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.