Guides - Monitor and Maintain a Compute Instance
Linux virtual machines equipped with a tailored set of resources designed to run any cloud-based workload.
Once you have a Compute Instance up and running, it’s time to think about monitoring and maintaining your server. This guide introduces the essential tools and skills you’ll need to keep your server up to date and minimize downtime. You’ll learn how to monitor the availability and performance of your system, manage your logs, and update your server’s software.
Availability Monitoring
The availability of your servers, and the websites and web applications you host on them, can be critically important. If you generate income from a blog or charge subscription fees for your web application, downtime can have a severe impact on your bottom line. Using an availability monitoring tool can help you rapidly detect and resolve service disruptions, thereby mitigating the impact on your websites and web applications.
Assess Your Needs
Not everyone needs to monitor the availability of their server. For example, if you use your Compute Instance to host a personal picture gallery website for friends and family, the occasional service interruption probably won’t bother you. The small inconvenience of your website going offline for a few minutes doesn’t justify the time it would take to set up and configure an availability monitoring tool.
If you depend on your website or web application for your livelihood, an availability monitoring tool is practically a necessity. Once set up, the tool actively watches your servers and services and alerts you when they’re unavailable. You’ll be able to troubleshoot the problem and restore service as quickly as possible.
Whether you use one Compute Instance or dozens of them, mission-critical servers and services should be watched by an independent monitoring tool that can keep tabs on their availability. The tool should have an automated method of detecting service-related incidents and be able to notify you via email, text message, or SMS. That way you’ll know that a server or service is down within minutes of it having failed.
Find the Right Tool
There are several different availability monitoring tools available. Your decision should be based on how many servers you’ll be monitoring:
- Multiple Servers: If you run more than one server, the Elastic Stack is an excellent monitoring tool.
- Single Server: If you only run a single server, you might want to use a third-party service to monitor your Compute Instance. You could also use a network diagnostic tool like MTR to diagnose and isolate networking errors.
- Linode Managed: The Managed service lets Linode manage your infrastructure and provides incident response around the clock.
Configure Shutdown Watchdog (Lassie)
Shutdown Watchdog, also known as Lassie, is a Cloud Manager feature capable of automatically rebooting your Compute Instance if it powers off unexpectedly. Lassie is not technically an availability monitoring tool, but it can help get your instance back online fast if it’s accidentally powered off.
To turn Lassie on and off, see the Recover from Unexpected Shutdowns with Lassie (Shutdown Watchdog) guide. Once Lassie is enabled, your Linode will automatically reboot if it is unexpectedly powered off in the future.
Performance Monitoring
Performance monitoring tools record vital server and service performance metrics. Similar to a vehicle’s dashboard, which has gauges for things like speed and oil pressure, performance monitoring tools provide valuable insight into the inner workings of your virtual server. With practice, you’ll be able to review this information and determine whether your server is in good health.
Cloud Manager
If you’re new to performance monitoring, you can get started by logging in to the Linode Cloud Manager. There are four simple graphs available on the Dashboard and in the Graphs section:
- CPU %: Monitor how your Linode’s CPU cores are being utilized. Note that each of your Linode’s CPU cores is capable of 100% utilization, which means you could see this graph spike well over 100%, depending on your Linode plan size.
- IPv4 Network Traffic: Keep tabs on how much incoming and outgoing bandwidth your server is using.
- IPv6 Network Traffic: Wondering if any of your visitors are using IPv6? Check this graph to see how much bandwidth has been transferred over IPv6.
- Disk I/O: Watch for disk input/output bottlenecks.
When you first start monitoring the graphs, you won’t know what numbers are normal. Don’t worry. With time and practice, you’ll learn what the graphs are supposed to look like when your server is operating normally. Then you’ll be able to spot performance abnormalities before they turn into full-blown problems.
Configure Cloud Manager Email Alerts
The Cloud Manager allow you to configure email alerts that automatically notify you through email if certain performance thresholds are reached, including:
- CPU Usage
- Disk IO Rate
- Incoming Traffic
- Outbound Traffic
- Transfer Quota
When setting the threshold for CPU usage, the maximum value can be calculated by multiplying the total number of available CPUs by 100. For example, this means that if an instance has 4 CPUs, the maximum threshold is 400%. Therefore, if you wish to be notified of relative CPU usage greater than 80% over 2 hours, you would set the Usage Threshold value to 320%.
To turn on and customize the alerts:
Log in to the Cloud Manager.
Click the Linodes link in the sidebar.
Select your Compute Instance. The instance’s details page appears.
Click the Settings tab. The Notification Thresholds panel appears, as shown below.
To enable an email alert, toggle the appropriate switch.
To configure the threshold for an alert, set a value in the threshold text field.
Click Save to save the email alert thresholds.
You have successfully configured email alerts in the Cloud Manager.
If you receive an email threshold alert from the Cloud Manager, do not be alarmed. It does not mean there is anything necessarily wrong with your instance.
For example, the instance may be operating above its normal threshold if it is performing intensive tasks such as compiling software (CPU, IO, or both) or if a major website just linked to your blog (increased traffic).
Use Third-Party Tools
The graphs in the Linode Cloud Manager provide basic information for things like CPU utilization and bandwidth consumption. That’s good information as far as it goes, but it won’t sate the appetite of true geeks who crave detailed statistics on a server’s disk, network, system, and service performance. For that kind of information, you’ll need to install and configure a third-party performance monitoring tool.
There are several free third-party performance monitoring tools available for your Linode:
- Munin: Munin is a system and network monitoring tool that generates graphs of resource usage in an accessible web based interface. Munin also makes it possible to monitor multiple Linodes with a single installation.
- Cacti: If you have advanced monitoring needs, try Cacti. It allows you to monitor larger systems and more complex deployments with its plugin framework and web-based interface.
Linode Managed
Linode Managed is our monitoring service that offers 24x7 incident response, dashboard metrics for your Linodes, free cPanel, and an automatic backup service. With a three-month Linode Managed commitment, you also get two complimentary standard site migrations performed by our Professional Services Team. If you are running more than one Compute Instance, not all are required to be managed. You can establish separate accounts (e.g., production and development) and monitor only the most critical services running on designated instance(s). Existing customers can sign up for Linode Managed by contacting support.
Manage Logs
Important events that occur on your system — things like login attempts or services being restarted — are recorded in your server’s logs. Similar to car maintenance records and completed tax forms, which provide a paper trail in the event of a problem or discrepancy, log files keep track of system events. You might review logs when troubleshooting errors, tracking usage, or investigating unusual behavior on your system.
Rotate Logs
As more and more events are logged, the log files on your server get bigger and bigger. Left unchecked, those files can start consuming a surprising amount of disk space. You can mitigate this problem by using logrotate, a utility that automatically archives and compresses current log files after a certain interval, creates new log files, and deletes old log files after a specified amount of time.
Use the logrotate guide to get started.
Monitor System Logs
It’s important to keep an eye on the events recorded in your system logs. But unless you’re the type of person who loves scanning through hundreds of lines of log entries, you won’t want to open log files unless absolutely necessary. Fortunately, there’s an easier way to learn about the most important system events fast. Logwatch is a customizable utility that can automatically parse system logs and email you detailed reports highlighting notable events.
Use the Logwatch guide to get started.
Update Software
Linux distributions are frequently updated to fix bugs, add new features, and patch security vulnerabilities. To take advantage of the new packages and patches, you’ll need to remember to perform some simple steps every once in a while. This section shows you what to do.
Update Installed Packages
You learned about the importance of regularly updating your server’s packages in the Setting Up and Securing a Compute Instance guide. If nothing else, installing updates is a fast and easy way to mitigate vulnerabilities on your server.
To check for software updates and install them in Ubuntu or Debian, enter the following commands, one by one:
apt-get update
apt-get upgrade --show-upgraded
There are ways to automate the installation of software updates, but this is not recommended. You should always manually review the lists of available patches before installing updates.
Apply Kernel Updates
When you first sign up for Linode and create a Compute Instance, the Cloud Manager automatically creates a configuration profile that uses either the distribution’s system kernel (in most cases) or uses the latest available Linode-supplied kernel.
If your system is using a Linode-supplied kernel, it’s important to know that we update the kernels as necessary and make them available in the Cloud Manager. In most cases, new kernels are automatically selected and, once a new kernel is released, all you have to do is reboot your Compute Instance to start using it.
To check for a new kernel and start using it on your Compute Instance:
First, check what version kernel your Compute Instance is currently using. Log in to your instance and execute the following command:
cat /proc/version
Examine the output and remember the version number:
Linux version 4.15.12-x86_64-linode105 (maker@build.linode.com) (gcc version 4.9.2 (Debian 4.9.2-10+deb8u1)) #1 SMP Thu Mar 22 02:13:40 UTC 2018
Log in to the Cloud Manager.
Click the Linodes link in the sidebar.
Select your Compute Instance. The instance’s details page appears.
Select the active configuration profile by clicking the Edit link, as shown below.
From the Kernel menu, verify that GRUB 2 is selected:
If you selected a new kernel, click Submit. The instance’s dashboard appears.
Select Reboot from the status menu to reboot your Compute Instance and start using the new kernel.
Upgrade to a New Release
Linux distributions such as Ubuntu and Fedora use version numbers to identify the individual versions, or releases, of the operating system. It’s important to know which release your server is running, as releases are usually supported for one or more years. After support for your release is discontinued, you won’t be able to download or apply critical security packages, which can put your server at risk.
There are two ways to upgrade a Compute Instance running an unsupported release. You can upgrade your existing server to the next release, or you can create a new Compute Instance with the newest release available and transfer your files from the old server. See our Upgrading guides for more information.
This page was originally published on