How To Monitor The Health Of Your Google Cloud Platform Instances

google cloud platform

Google Cloud Platform comes with a suite of monitoring tools that track metrics on any cloud resource you use, which can help you spot issues in your infrastructure. On top of that, it also offers full support for AWS.

Configure a monitoring dashboard

By default, GCP creates a dashboard for some major resources, such as Cloud Storage buckets, storage disks, and Compute Engine instances, which are visible from the “Monitoring” tabs for those resources. However, the full “Monitoring” service is available in the sidebar, under “Operations”:

You can display the existing dashboards under the “Dashboards” tab:

By default, the Compute Engine instance graph shows CPU usage, disk I / O, and any alarms that have been triggered recently. You can filter all of these charts by time and date using the controls at the top.

Compute Engine instances display CPU usage, disk I / O, and any other alarms.

If you want to create your own dashboard, it’s pretty straightforward. Dashboards are generic: you can create a dashboard that can apply to any GCE instance, and then filter the dashboard by instance name, project ID, or area ID. This way, you can set up a dashboard with all the useful metrics and then reuse it for any resource of the same type. If you want the dashboard to show a specific instance, that’s also possible.

From the “Dashboards Panel”, create a new dashboard:

Create a new dashboard.

Each dashboard contains multiple charts, which display metrics for a given resource. Create a new chart to add to the dashboard:

Create a new chart to add to the dashboard.

From this dialog box, you have full control over which metrics you want to view. You are not limited to a single statistic, as charts can show multiple stats on top of each other, although there is no guarantee that the two charts will make sense together.

The resource type allows you to select the type of cloud resource you are monitoring, whether it is a Cloud Storage bucket, a database, a Compute Engine or EC2 instance, or roughly any GCP or AWS resource you can think of. This will filter out the metrics that you can use to display only the metrics that apply to the given resource.

The name of the metric will select the data to display on the graph. There are plenty of metrics for complex resources like GCE instances, but the most common ones like CPU usage, disk I / O, memory usage, and network I / O are all here.

The filter allows you to pre-select a particular project, instance, zone or group. You can always change this from the dashboard to show other instances, but this will set the default filter.

Grouping by will change the way multiple resources are displayed on the graph. If you add a graph to monitor a group of instances, you can, for example, choose to separate them by instance name.

Several resources are displayed on a graph.

Once added to the chart, you can always change settings from the menu on the chart, or activate “Statistics Mode”, which will display moving averages and other useful features.

Defining custom alarms

One of the most useful features of GCP’s monitoring suite is being able to set custom alarms that will alert you if there is a problem with your network.

There are two types of alarms offered by Monitoring, both completely free and unlimited for everyone. Uptime checks will query a Web or TCP service to ensure that it is still operational. Alert policies will monitor metrics and send alerts whenever they reach a certain level or something unusual happens.

Availability checks are straightforward and available from the main “Overview” tab. You will just need to give it your hostname and set a verification interval.

Once you click “Save”, it will ask you to create an alert policy for the verification, which will send you notifications if it fails.

If you want to configure your own alert policy, you can do so from the “Alerts” sidebar. This allows you to select a resource, metric, filter, and group and configure it to send a notification if the chart meets the given condition. For example, you can set an alarm to trigger if the CPU usage on the instance is greater than 80% for at least a few minutes.

Set an alarm to trigger if the CPU usage on the instance is greater than 80% for at least a few minutes.

Of course, you will need to configure notifications for the alarm. The easiest option is to set up a notification channel to send you an email, but other options are available, such as SMS notifications, Slack notifications, or posting to a webhook.

Configuration of a notification channel.

All of these notification options are completely free.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.