An Introduction to Azure Monitor
This blog post is part of an online event “Azure Back to School” run by my friend and Microsoft Azure MVP Dwayne Natwick. In this blog post, I am going to talk about Azure monitor, what it does, and how you can use it in your Azure-based solutions.
What is Azure Monitor?
In today’s complex cloud environments, it is very essential to have visibility into what’s going on in your cloud environment and act on it accordingly. In September 2018, Application Insights (which is an Application Performance Management service) and Log Analytics workspace (which is a log aggregation and analysis platform) were combined together inside a single service known as Azure Monitor. Azure Monitor offers a one-stop service for collecting, analyzing, and acting on operational telemetry from the cloud and on-prem environments in one centralized, fully managed location.
To start with, it is important to understand that any data collected through Azure monitor is categorized among one of the following two fundamental types:
- Metrics – Metrics are numeric values that define system properties like CPU and memory at a particular time. The metrics data is stored in a time-series database and can be analyzed using metrics explorer. You can access Metrics Explorer by going inside Azure Monitor -> Metrics -> Choosing the resource in the scope, the right metric, and the right Aggregation in the filter (as shown below):
- Logs – Logs represent the application-specific custom events, traces, and performance data. Logs can be structured or free form text. Logs in Azure Monitor are stored in the Log Analytics workspace which offers a rich query language (Kusto Query language) for log analysis.
Azure Monitor can be used to bring full observability into applications, infrastructure, and networks. Azure Monitor can collect and aggregate data from a variety of sources including web applications and the Azure Platform that can be used for analysis, visualization, and alerting. Azure Monitor supports collecting data from the following data sources:
- Guest Operating System – Using one or more agents, telemetry from guest operating systems (including Windows Event logs, Syslogs, performance counters, IIS logs, and crash dumps) can be aggregated in Azure Monitor. To collect monitoring logs from the guest operating system, you can use one of the following agents:
- Azure Diagnostics extension – can only be used with Azure virtual machines to send data not only to Azure monitor but, also to a storage account or an event hub.
- Log Analytics agent – can be used with virtual machines running in Azure, on-prem, and any other cloud to send log and performance data to Azure monitor.
- Telgraf agent – can be used to collect data from Linux machines and send it to Azure Monitor.
- Azure Monitor agent – is still in preview and will replace the log analytics agent and telgraf agent.
- Application Code – Application Insights is an extensible Application Performance Management(APM) service which can be used to collect telemetry data from web applications running on a variety of platforms. To use Application Insights with your application, you need to use the Application Insights agent and if the agent is not supported, you need to install a small instrumentation package (SDK) in your application. Through application insights, you can monitor the application requests, identify dependencies that slow down your application, find out exceptions in your application, capture user and session counts, and monitor performance counters like CPU and memory usage.
Application Insights can be used to monitor your applications running anywhere including Azure, on-premises, or any other cloud. Application Insights can be used to monitor client applications, web services, and/or any background services. Once the telemetry is captured in Application Insights, it can be used for alerting, exported to PowerBi for reporting, or can be continuously exported to an event hub or a storage account for analysis with a third party tool.
- Azure Resources: For each Azure resource, Azure capture the following two types of telemetry data:
- Metrics – represent platform metrics like CPU, memory, etc. These metrics are captured for all the Azure resources, by default.
- Resource Logs – For all the resources in Azure, all the write operations (on the data plane) are captured under the resource logs.
- Azure Subscription – Using Azure monitor you can capture all the telemetry related to your Azure subscription. At a high level, you can capture the following two categories of logs for your Azure subscription:
- Service Health – represents the service health events and incidents.
- Activity Logs – Activity logs capture what, who, and when for any write operation (on management plane) that has taken place for on each Azure resource within the Azure subscription.
- Azure Tenant – For each Azure Active Directory tenant, it contains the history of sign-in activity and audit trail of changes made in the tenant. To enable these logs, you need an Azure AD Premium P1 or P2 license. To export the sign-in and audit logs to Azure monitor, you need to enable diagnostic settings for your Azure AD tenant.
- Custom Sources – You can also collect logs from custom sources using the Data Collector API over REST. Since it is a REST call, it can be called from any client (within Azure or outside). First, a log analytics workspace needs to be provisioned, and then using its key, you can authenticate the Data Collector API calls. Log analytics workspaces store any data in the form of records and it expects the format of the data coming from the Data Collector API to represent multiple records in JSON.
Azure Monitor Pricing
Azure Monitor is priced differently for the usage of Log Analytics Workspaces and Application Insights instances. For both, the data ingestion and data retention are billable.
Log Analytics pricing – For data ingestion, if you can estimate the amount of data that will be ingested in your log analytics workspace, you might go for reserved pricing where you can pay a fixed fee upfront, else, you can go for Pay as you go pricing which allows first 5 GB of data ingestion free per billing account per month. The first 31 days of data retention for a log analytics workspace is free and beyond 31 days, it is charged $0.12 per GB per month.
Application Insights Pricing – For data ingestion, the first 5 GB of data ingestion is free per billing account per month. The first 90 days of data retention for an application insights instance is free and beyond 90 days, it is charged $0.12 per GB per month.
For most accurate and up-to-date pricing, please visit – https://azure.microsoft.com/en-us/pricing/details/monitor/
Diagnostic Settings for Azure Resources
Platform metrics and logs including Azure activity logs and resource logs can be exported out to different destinations by setting the diagnostic settings. For setting diagnostic settings, go to the resource blade and look for diagnostic settings. and click on “Add diagnostic setting“:
You can configure the available platform metrics and logs to be exported to a log analytics workspace, storage account, or an event hub:
The default retention period for log data in the log analytics workspace is 30 days and can be increased up to 730 days. And for application insights, the default retention period is 90 days and can be extended up to a period of 730 days. Increasing the data retention period might result in additional charges, hence, be very clear about the duration that you need the data to be retained for.
You can change the retention period for a log analytics workspace or application insights instance period by going in the “Usage and estimated costs” tab under the resource blade:
To retain telemetry data from Application Insights beyond the retention period, you can also export it out to a storage account.
Alerts can be set up on different events based on metrics and logs using Action groups and alert rules. Alert rules define the condition you would like to evaluate to trigger the alert and an action group defines the action you want to perform. For example – an alert rule can be created to trigger an action group (to send email/SMS or even a webhook) when the CPU consumption of a virtual machine cross 80% for the last 5 minutes. To create and manage action groups and alert rules, go to the Alerts tab under Azure Monitor blade:
Azure Monitor offers a feature for auto-scaling compute resources based on platform metrics. The feature is supported for Virtual Machine Scale Sets, Web apps, API Management, cloud services, and Azure Data Explorer clusters. etc. Autoscale works on top of alert rules and action groups. Using alert rules, you can monitor the underlying resources metrics, and based on the usage, you can trigger an action group to provision or de-provision additional resources. This comes very handy when you have varying resource usage at different times. For example – my web app might not see same amount of traffic over the weekend compared to weekdays. I can setup autoscale to automatically reduce the number of instances when the cpu usage goes down below certain value and automatically add more instances when the usage goes up.
Metric and Log data stored in Azure Monitor can be visualized through a variety of methods including Workbooks, Azure Dashboards, PowerBI, and Grafana. For more details on using your tool of choice for creating dashboards and reports, please visit – https://docs.microsoft.com/en-us/azure/azure-monitor/visualizations
References / Resources for further reading
Thanks for reading! I hope you’ll find it helpful. If you have any questions or if you need additional information, feel free to reach out to me on Twitter or LinkedIn.