Tutorial: Use Kusto queries
The best way to learn about the Kusto Query Language is to look at some basic queries to get a "feel" for the language. We recommend using a database with some sample data. The queries that are demonstrated in this tutorial should run on that database. The StormEvents
table in the sample database provides some information about storms that happened in the United States.
Count rows
Our example database has a table called StormEvents
. we want to find out how large the table is. So we'll pipe its content into an operator that counts the rows in the table.
Syntax note: A query is a data source (usually a table name), optionally followed by one or more pairs of the pipe character and some tabular operator.
StormEvents | count
Here's the output:
Count |
---|
59066 |
For more information, see count operator.
Select a subset of columns: project
Use project to pick out only the columns you want. See the following example, which uses both the project and the take operators.
Filter by Boolean expression: where
Let's see only flood
events in California
in Feb-2007:
StormEvents
| where StartTime > datetime(2007-02-01) and StartTime < datetime(2007-03-01)
| where EventType == 'Flood' and State == 'CALIFORNIA'
| project StartTime, EndTime , State , EventType , EpisodeNarrative
Here's the output:
StartTime | EndTime | State | EventType | EpisodeNarrative |
---|---|---|---|---|
2007-02-19 00:00:00.0000000 | 2007-02-19 08:00:00.0000000 | CALIFORNIA | Flood | A frontal system moving across the Southern San Joaquin Valley brought brief periods of heavy rain to western Kern County in the early morning hours of the 19th. Minor flooding was reported across State Highway 166 near Taft. |
Show n rows: take
Let's see some data. What's in a random sample of five rows?
StormEvents
| take 5
| project StartTime, EndTime, EventType, State, EventNarrative
Here's the output:
StartTime | EndTime | EventType | State | EventNarrative |
---|---|---|---|---|
2007-09-18 20:00:00.0000000 | 2007-09-19 18:00:00.0000000 | Heavy Rain | FLORIDA | As much as 9 inches of rain fell in a 24-hour period across parts of coastal Volusia County. |
2007-09-20 21:57:00.0000000 | 2007-09-20 22:05:00.0000000 | Tornado | FLORIDA | A tornado touched down in the Town of Eustis at the northern end of West Crooked Lake. The tornado quickly intensified to EF1 strength as it moved north northwest through Eustis. The track was just under two miles long and had a maximum width of 300 yards. The tornado destroyed 7 homes. Twenty seven homes received major damage and 81 homes reported minor damage. There were no serious injuries and property damage was set at $6.2 million. |
2007-09-29 08:11:00.0000000 | 2007-09-29 08:11:00.0000000 | Waterspout | ATLANTIC SOUTH | A waterspout formed in the Atlantic southeast of Melbourne Beach and briefly moved toward shore. |
2007-12-20 07:50:00.0000000 | 2007-12-20 07:53:00.0000000 | Thunderstorm Wind | MISSISSIPPI | Numerous large trees were blown down with some down on power lines. Damage occurred in eastern Adams county. |
2007-12-30 16:00:00.0000000 | 2007-12-30 16:05:00.0000000 | Thunderstorm Wind | GEORGIA | The county dispatch reported several trees were blown down along Quincey Batten Loop near State Road 206. The cost of tree removal was estimated. |
But take shows rows from the table in no particular order, so let's sort them. (limit is an alias for take and has the same effect.)
Order results: sort, top
- Syntax note: Some operators have parameters that are introduced by keywords like
by
. - In the following example,
desc
orders results in descending order andasc
orders results in ascending order.
Show me the first n rows, ordered by a specific column:
StormEvents
| top 5 by StartTime desc
| project StartTime, EndTime, EventType, State, EventNarrative
Here's the output:
StartTime | EndTime | EventType | State | EventNarrative |
---|---|---|---|---|
2007-12-31 22:30:00.0000000 | 2007-12-31 23:59:00.0000000 | Winter Storm | MICHIGAN | This heavy snow event continued into the early morning hours on New Year's Day. |
2007-12-31 22:30:00.0000000 | 2007-12-31 23:59:00.0000000 | Winter Storm | MICHIGAN | This heavy snow event continued into the early morning hours on New Year's Day. |
2007-12-31 22:30:00.0000000 | 2007-12-31 23:59:00.0000000 | Winter Storm | MICHIGAN | This heavy snow event continued into the early morning hours on New Year's Day. |
2007-12-31 23:53:00.0000000 | 2007-12-31 23:53:00.0000000 | High Wind | CALIFORNIA | North to northeast winds gusting to around 58 mph were reported in the mountains of Ventura county. |
2007-12-31 23:53:00.0000000 | 2007-12-31 23:53:00.0000000 | High Wind | CALIFORNIA | The Warm Springs RAWS sensor reported northerly winds gusting to 58 mph. |
You can achieve the same result by using either sort, and then take:
StormEvents
| sort by StartTime desc
| take 5
| project StartTime, EndTime, EventType, EventNarrative
Compute derived columns: extend
Create a new column by computing a value in every row:
StormEvents
| take 5
| extend Duration = EndTime - StartTime
| project StartTime, EndTime, Duration, EventType, State
Here's the output:
StartTime | EndTime | Duration | EventType | State |
---|---|---|---|---|
2007-09-18 20:00:00.0000000 | 2007-09-19 18:00:00.0000000 | 22:00:00 | Heavy Rain | FLORIDA |
2007-09-20 21:57:00.0000000 | 2007-09-20 22:05:00.0000000 | 00:08:00 | Tornado | FLORIDA |
2007-09-29 08:11:00.0000000 | 2007-09-29 08:11:00.0000000 | 00:00:00 | Waterspout | ATLANTIC SOUTH |
2007-12-20 07:50:00.0000000 | 2007-12-20 07:53:00.0000000 | 00:03:00 | Thunderstorm Wind | MISSISSIPPI |
2007-12-30 16:00:00.0000000 | 2007-12-30 16:05:00.0000000 | 00:05:00 | Thunderstorm Wind | GEORGIA |
It's possible to reuse a column name and assign a calculation result to the same column.
Example:
print x=1
| extend x = x + 1, y = x
| extend x = x + 1
Here's the output:
x | y |
---|---|
3 | 1 |
Scalar expressions can include all the usual operators (+
, -
, *
, /
, %
), and a range of useful functions are available.
Aggregate groups of rows: summarize
Count the number of events occur in each state:
StormEvents
| summarize event_count = count() by State
summarize groups together rows that have the same values in the by
clause, and then uses an aggregation function (for example, count
) to combine each group in a single row. In this case, there's a row for each state and a column for the count of rows in that state.
A range of aggregation functions are available. You can use several aggregation functions in one summarize
operator to produce several computed columns. For example, we could get the count of storms per state, and the sum of unique types of storm per state. Then, we could use top to get the most storm-affected states:
StormEvents
| summarize StormCount = count(), TypeOfStorms = dcount(EventType) by State
| top 5 by StormCount desc
Here's the output:
State | StormCount | TypeOfStorms |
---|---|---|
TEXAS | 4701 | 27 |
KANSAS | 3166 | 21 |
IOWA | 2337 | 19 |
ILLINOIS | 2022 | 23 |
MISSOURI | 2016 | 20 |
In the results of a summarize
operator:
- Each column is named in
by
. - Each computed expression has a column.
- Each combination of
by
values has a row.
Summarize by scalar values
You can use scalar (numeric, time, or interval) values in the by
clause, but you'll want to put the values into bins by using the bin() function:
StormEvents
| where StartTime > datetime(2007-02-14) and StartTime < datetime(2007-02-21)
| summarize event_count = count() by bin(StartTime, 1d)
The query reduces all the timestamps to intervals of one day:
StartTime | event_count |
---|---|
2007-02-14 00:00:00.0000000 | 180 |
2007-02-15 00:00:00.0000000 | 66 |
2007-02-16 00:00:00.0000000 | 164 |
2007-02-17 00:00:00.0000000 | 103 |
2007-02-18 00:00:00.0000000 | 22 |
2007-02-19 00:00:00.0000000 | 52 |
2007-02-20 00:00:00.0000000 | 60 |
The bin() is the same as the floor() function in many languages. It simply reduces every value to the nearest multiple of the modulus that you supply, so that summarize can assign the rows to groups.
Display a chart or table: render
You can project two columns and use them as the x-axis and the y-axis of a chart:
StormEvents
| summarize event_count=count(), mid = avg(BeginLat) by State
| sort by mid
| where event_count > 1800
| project State, event_count
| render columnchart
Although we removed mid
in the project
operation, we still need it if we want the chart to display the states in that order.
Strictly speaking, render
is a feature of the client rather than part of the query language. Still, it's integrated into the language, and it's useful for envisioning your results.
Timecharts
Going back to numeric bins, let's display a time series:
StormEvents
| summarize event_count=count() by bin(StartTime, 1d)
| render timechart
Multiple series
Use multiple values in a summarize by
clause to create a separate row for each combination of values:
StormEvents
| where StartTime > datetime(2007-06-04) and StartTime < datetime(2007-06-10)
| where Source in ("Source","Public","Emergency Manager","Trained Spotter","Law Enforcement")
| summarize count() by bin(StartTime, 10h), Source
Just add the render
term to the preceding example: | render timechart
.
Notice that render timechart
uses the first column as the x-axis, and then displays the other columns as separate lines.
Daily average cycle
How does activity vary over the average day?
Count events by the time modulo one day, binned into hours.
StormEvents
| extend hour =bin(StartTime % 1d , 1h)
| summarize event_count=count() by hour
| sort by hour asc
| render timechart
Currently, render
doesn't label durations properly, but we could use | render columnchart
instead:
Compare multiple daily series
How does activity vary over the time of day in different states?
StormEvents
| extend hour= bin( StartTime % 1d , 1h)
| where State in ("GULF OF MEXICO","MAINE","VIRGINIA","WISCONSIN","NORTH DAKOTA","NEW JERSEY","OREGON")
| summarize event_count=count() by hour, State
| render timechart
Divide by 1h
to turn the x-axis into an hour number instead of a duration:
StormEvents
| extend hour= bin( StartTime % 1d , 1h)/ 1h
| where State in ("GULF OF MEXICO","MAINE","VIRGINIA","WISCONSIN","NORTH DAKOTA","NEW JERSEY","OREGON")
| summarize event_count=count() by hour, State
| render columnchart
Join data types
How would you find two specific event types and in which state each of them happened?
You can pull storm events with the first EventType
and the second EventType
, and then join the two sets on State
:
StormEvents
| where EventType == "Lightning"
| join (
StormEvents
| where EventType == "Avalanche"
) on State
| distinct State
User session example of join
This section doesn't use the StormEvents
table.
Assume you have data that includes events which mark the start and end of each user session with a unique ID.
How would you find out how long each user session lasts?
You can use extend
to provide an alias for the two timestamps, and then compute the session duration:
Events
| where eventName == "session_started"
| project start_time = timestamp, stop_time, country, session_id
| join ( Events
| where eventName == "session_ended"
| project stop_time = timestamp, session_id
) on session_id
| extend duration = stop_time - start_time
| project start_time, stop_time, country, duration
| take 10
It's a good practice to use project
to select just the relevant columns before you perform the join. In the same clause, rename the timestamp
column.
Plot a distribution
Returning to the StormEvents
table, how many storms are there of different lengths?
StormEvents
| extend duration = EndTime - StartTime
| where duration > 0s
| where duration < 3h
| summarize event_count = count()
by bin(duration, 5m)
| sort by duration asc
| render timechart
Or, you can use | render columnchart
:
Percentiles
What ranges of durations do we find in different percentages of storms?
To get this information, use the preceding query from Plot a distribution, but replace render
with:
| summarize percentiles(duration, 5, 20, 50, 80, 95)
In this case, we didn't use a by
clause, so the output is a single row:
We can see from the output that:
- 5% of storms have a duration of less than 5 minutes.
- 50% of storms lasted less than 1 hour and 25 minutes.
- 95% of storms lasted less than 2 hours and 50 minutes.
To get a separate breakdown for each state, use the state
column separately with both summarize
operators:
StormEvents
| extend duration = EndTime - StartTime
| where duration > 0s
| where duration < 3h
| summarize event_count = count()
by bin(duration, 5m), State
| sort by duration asc
| summarize percentiles(duration, 5, 20, 50, 80, 95) by State
Percentages
Using the StormEvents table, we can calculate the percentage of direct injuries from all injuries.
StormEvents
| where (InjuriesDirect > 0) and (InjuriesIndirect > 0)
| extend Percentage = ( 100 * InjuriesDirect / (InjuriesDirect + InjuriesIndirect) )
| project StartTime, InjuriesDirect, InjuriesIndirect, Percentage
The query removes zero count entries:
StartTime | InjuriesDirect | InjuriesIndirect | Percentage |
---|---|---|---|
2007-05-01T16:50:00Z | 1 | 1 | 50 |
2007-08-10T21:25:00Z | 7 | 2 | 77 |
2007-08-23T12:05:00Z | 7 | 22 | 24 |
2007-08-23T14:20:00Z | 3 | 2 | 60 |
2007-09-10T13:45:00Z | 4 | 1 | 80 |
2007-12-06T08:30:00Z | 3 | 3 | 50 |
2007-12-08T12:00:00Z | 1 | 1 | 50 |
Assign a result to a variable: let
Use let to separate out the parts of the query expression in the preceding join
example. The results are unchanged:
let LightningStorms =
StormEvents
| where EventType == "Lightning";
let AvalancheStorms =
StormEvents
| where EventType == "Avalanche";
LightningStorms
| join (AvalancheStorms) on State
| distinct State
Tip
In Kusto Explorer, to execute the entire query, don't add blank lines between parts of the query. Any two statements must be separated by a semicolon.
Combine data from several databases in a query
In the following query, the Logs
table must be in your default database:
Logs | where ...
To access a table in a different database, use the following syntax:
database("db").Table
For example, if you have databases named Diagnostics
and Telemetry
and you want to correlate some of the data in the two tables, you might use the following query (assuming Diagnostics
is your default database):
Logs | join database("Telemetry").Metrics on Request MachineId | ...
Use this query if your default database is Telemetry
:
union Requests, database("Diagnostics").Logs | ...
The preceding two queries assume that both databases are in the cluster you're currently connected to. If the Telemetry
database was in a cluster named TelemetryCluster.kusto.windows.net, to access it, use this query:
Logs | join cluster("TelemetryCluster").database("Telemetry").Metrics on Request MachineId | ...
Note
When the cluster is specified, the database is mandatory.
For more information about combining data from several databases in a query, see cross-database queries.
Next steps
- View code samples for the Kusto Query Language.
The best way to learn about the Azure Data Explorer Query Language is to look at some basic queries to get a "feel" for the language. These queries are similar to queries in the Azure Data Explorer tutorial, but use data from common tables in an Azure Log Analytics workspace.
Run these queries by using Log Analytics in the Azure portal. Log Analytics is a tool you can use to write log queries. Use log data in Azure Monitor, and then evaluate log query results. If you aren't familiar with Log Analytics, complete the Log Analytics tutorial.
All queries in this tutorial use the Log Analytics demo environment. You can use your own environment, but you might not have some of the tables that are used here. Because the data in the demo environment isn't static, the results of your queries might vary slightly from the results shown here.
Count rows
The InsightsMetrics table contains performance data that's collected by insights such as Azure Monitor for VMs and Azure Monitor for containers. To find out how large the table is, we'll pipe its content into an operator that counts rows.
A query is a data source (usually a table name), optionally followed by one or more pairs of the pipe character and some tabular operator. In this case, all records from the InsightsMetrics
table are returned and then sent to the count operator. The count
operator displays the results because the operator is the last command in the query.
InsightsMetrics | count
Here's the output:
Count |
---|
1,263,191 |
Filter by Boolean expression: where
The AzureActivity table has entries from the Azure activity log, which provides insight into subscription-level or management group-level events occurring in Azure. Let's see only Critical
entries during a specific week.
The where operator is common in the Kusto Query Language. where
filters a table to rows that match specific criteria. The following example uses multiple commands. First, the query retrieves all records for the table. Then, it filters the data for only records that are in the time range. Finally, it filters those results for only records that have a Critical
level.
Note
In addition to specifying a filter in your query by using the TimeGenerated
column, you can specify the time range in Log Analytics. For more information, see Log query scope and time range in Azure Monitor Log Analytics.
AzureActivity
| where TimeGenerated > datetime(10-01-2020) and TimeGenerated < datetime(10-07-2020)
| where Level == 'Critical'
Select a subset of columns: project
Use project to include only the columns you want. Building on the preceding example, let's limit the output to certain columns:
AzureActivity
| where TimeGenerated > datetime(10-01-2020) and TimeGenerated < datetime(10-07-2020)
| where Level == 'Critical'
| project TimeGenerated, Level, OperationNameValue, ResourceGroup, _ResourceId
Show n rows: take
NetworkMonitoring contains monitoring data for Azure virtual networks. Let's use the take operator to look at 10 random sample rows in that table. The take shows some rows from a table in no particular order:
NetworkMonitoring
| take 10
| project TimeGenerated, Computer, SourceNetwork, DestinationNetwork, HighLatency, LowLatency
Order results: sort, top
Instead of random records, we can return the latest five records by first sorting by time:
NetworkMonitoring
| sort by TimeGenerated desc
| take 5
| project TimeGenerated, Computer, SourceNetwork, DestinationNetwork, HighLatency, LowLatency
You can get this exact behavior by instead using the top operator:
NetworkMonitoring
| top 5 by TimeGenerated desc
| project TimeGenerated, Computer, SourceNetwork, DestinationNetwork, HighLatency, LowLatency
Compute derived columns: extend
The extend operator is similar to project, but it adds to the set of columns instead of replacing them. You can use both operators to create a new column based on a computation on each row.
The Perf table has performance data that's collected from virtual machines that run the Log Analytics agent.
Perf
| where ObjectName == "LogicalDisk" and CounterName == "Free Megabytes"
| project TimeGenerated, Computer, FreeMegabytes = CounterValue
| extend FreeGigabytes = FreeMegabytes / 1000
Aggregate groups of rows: summarize
The summarize operator groups together rows that have the same values in the by
clause. Then, it uses an aggregation function like count
to combine each group in a single row. A range of aggregation functions are available. You can use several aggregation functions in one summarize
operator to produce several computed columns.
The SecurityEvent table contains security events like logons and processes that started on monitored computers. You can count how many events of each level occurred on each computer. In this example, a row is produced for each computer and level combination. A column contains the count of events.
SecurityEvent
| summarize count() by Computer, Level
Summarize by scalar values
You can aggregate by scalar values like numbers and time values, but you should use the bin() function to group rows into distinct sets of data. For example, if you aggregate by TimeGenerated
, you'll get a row for most time values. Use bin()
to consolidate values per hour or day.
The InsightsMetrics table contains performance data that's organized according to insights from Azure Monitor for VMs and Azure Monitor for containers. The following query shows the hourly average processor utilization for multiple computers:
InsightsMetrics
| where Computer startswith "DC"
| where Namespace == "Processor" and Name == "UtilizationPercentage"
| summarize avg(Val) by Computer, bin(TimeGenerated, 1h)
Display a chart or table: render
The render operator specifies how the output of the query is rendered. Log Analytics renders output as a table by default. You can select different chart types after you run the query. The render
operator is useful to include in queries in which a specific chart type usually is preferred.
The following example shows the hourly average processor utilization for a single computer. It renders the output as a timechart.
InsightsMetrics
| where Computer == "DC00.NA.contosohotels.com"
| where Namespace == "Processor" and Name == "UtilizationPercentage"
| summarize avg(Val) by Computer, bin(TimeGenerated, 1h)
| render timechart
Work with multiple series
If you use multiple values in a summarize by
clause, the chart displays a separate series for each set of values:
InsightsMetrics
| where Computer startswith "DC"
| where Namespace == "Processor" and Name == "UtilizationPercentage"
| summarize avg(Val) by Computer, bin(TimeGenerated, 1h)
| render timechart
Join data from two tables
What if you need to retrieve data from two tables in a single query? You can use the join operator to combine rows from multiple tables in a single result set. Each table must have a column that has a matching value so that the join understands which rows to match.
VMComputer is a table that Azure Monitor uses for VMs to store details about virtual machines that it monitors. InsightsMetrics contains performance data that's collected from those virtual machines. One value collected in InsightsMetrics is available memory, but not the percentage memory that's available. To calculate the percentage, we need the physical memory for each virtual machine. That value is in VMComputer
.
The following example query uses a join to perform this calculation. The distinct operator is used with VMComputer
because details are regularly collected from each computer. As result, the table contains multiple rows for each computer. The two tables are joined using the Computer
column. A row is created in the resulting set that includes columns from both tables for each row in InsightsMetrics
, where the value in Computer
has the same value in the Computer
column in VMComputer
.
VMComputer
| distinct Computer, PhysicalMemoryMB
| join kind=inner (
InsightsMetrics
| where Namespace == "Memory" and Name == "AvailableMB"
| project TimeGenerated, Computer, AvailableMemoryMB = Val
) on Computer
| project TimeGenerated, Computer, PercentMemory = AvailableMemoryMB / PhysicalMemoryMB * 100
Assign a result to a variable: let
Use let to make queries easier to read and manage. You can use this operator to assign the results of a query to a variable that you can use later. By using the let
statement, the query in the preceding example can be rewritten as:
let PhysicalComputer = VMComputer
| distinct Computer, PhysicalMemoryMB;
let AvailableMemory = InsightsMetrics
| where Namespace == "Memory" and Name == "AvailableMB"
| project TimeGenerated, Computer, AvailableMemoryMB = Val;
PhysicalComputer
| join kind=inner (AvailableMemory) on Computer
| project TimeGenerated, Computer, PercentMemory = AvailableMemoryMB / PhysicalMemoryMB * 100
Next steps
- View code samples for the Kusto Query Language.