System time leaps in VirtualBox guests

Apr 2, 2017

5 minutes read

At the beginning of May 2016, when I started to evaluate VirtualBox 5.0.6, I noticed that the time inside the CentOS guest VM was wrong, despite having an NTP daemon installed and running. Having an inaccurate time can cause serious problems:

the inability to match log entries across different systems
software updates not being applied on time, because the yum metadata isn’t considered as expired
proxy caches not being emptied, again due to missed expiry dates
filesystem access and modification times getting the wrong values

In one particular case, the system time was a minute behind immediately after booting, something that NTP can normally handle quite well; after one hour, the guest system time had barely advanced 6 minutes. Different tests have also shown that the system time deviated less from the correct value if the guest performed CPU-intensive operations like compiling or image processing.

Clock sources in Linux

During boot, the Linux kernel attempts to determine which hardware clock is most suitable for keeping track of time:

TSC, or the timestamp counter, is the most precise, with accuracy below 1ns. It is also the least expensive to read, being just a CPU register.
HPET, the high-performance event timer, has a precision of around 100ns, and is also more expensive to query than the TSC. VirtualBox only offers a command-line interface to enable it.
acpi_pm has even less precision, since it runs with a clock of 3.58MHz. Its only advantage is that it’s always available.

The kernel will choose TSC when possible, because it’s the most accurate clock source and reading it has the lowest impact on system performance. Newer CPUs (at least since Intel’s Nehalem arhitecture) have a so-called “invariant TSC”, which is always counting with the same frequency. The TSC of older processors usually stops counting when the CPU enters a sleep state, which happens as soon as it becomes idle; the kernel detects this during boot and will refuse to use TSC as its clocksource:

tsc: Marking TSC unstable due to TSC halts in idle states deeper than C2

We can find out which clock source the kernel is using:

$ cat /sys/devices/system/clocksource/*/current_clocksource
kvm-clock

This is normal: VirtualBox 5 added a new paravirtualization setting, emulating KVM by default for Linux guests.

Measuring TSC in the guest

A first debugging step would be to compare the TSC readings with the guest system time (we can force Linux to use HPET by adding clocksource=hpet to the kernel command line). We can read the TSC and the system time in a loop, with short sleeps in between, to allow the CPU to enter an idle state:

#include <time.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

const uint64_t ns_per_s = 1000000000L; /* 1s = 1e+9ns */

static inline uint64_t clk_mono() {
  struct timespec ts;
  clock_gettime(CLOCK_MONOTONIC, &ts);
  return ts.tv_sec * ns_per_s + ts.tv_nsec;
}

static inline uint64_t RDTSC() {
  /* My CPU doesn't support the RDTSCP instruction, which would
   * serialize the out-of-order execution; a CPUID instruction could
   * be used to achieve the same effect, but considering how big the
   * error in system time is, an inaccuracy of a few nanoseconds is
   * acceptable.
   */
  unsigned hi, lo;
  __asm__ volatile("rdtsc"
                   : "=a"(lo), "=d"(hi)
                   : "a"(0)
                   : "%ebx", "%ecx");
  return ((uint64_t)hi << 32) | lo;
}

void clk_test(unsigned sleep_ms) {
  /* clk_mono() returns ns, while tsc is just a counter */
  uint64_t mono, initial_mono, last_mono, tsc;
  initial_mono = last_mono = clk_mono();
  printf("# Monotonic_clock TSC\n");
  /* One minute total measurement time */
  while (last_mono - initial_mono < 60 * ns_per_s) {
    if (sleep_ms)
      usleep(1000 * sleep_ms);
    tsc = RDTSC();
    mono = clk_mono();
    /* record values roughly 0.5s apart */
    if (mono - last_mono >= ns_per_s / 2)
    {
      printf("%lu %lu\n", mono, tsc);
      last_mono = mono;
    }
  }
  printf("\n\n"); /* separate sections for GNUplot */
}

int main() {
  clk_test(0);
  clk_test(1);
  clk_test(10);
  clk_test(100);
  clk_test(1000);
  return 0;
}

My Mac Mini has an old Intel Core 2 Duo (Penryn), without an invariant TSC. We should see a different slope when we plot TSC vs system clock, depending on whether we called usleep or not:

plot of TSC vs system time

Linux will refuse to use TSC when it boots directly on the Mac Mini; inside VirtualBox, it reports kvm-clock as its clocksource. If VirtualBox relies on TSC for its KVM emulation, it would explain why the system time falls behind more when the CPU is mostly idle.

Host time vs guest time

We can’t directly access the guest time from the host (the reverse is also true). But we can start a web server in the guest, and access it from a Python script running on the host; we can then use the Date: HTTP header to read the system time of the guest, and record it along the system time of the host:

#!/opt/local/bin/python2.7
from __future__ import unicode_literals, print_function
import time
from datetime import datetime
import requests


def guest_timestamp():
	"""Return number of seconds since the Unix epoch."""
	r = requests.head('http://192.168.56.2:8000/')
	date_string = r.headers['date']
	dt = datetime.strptime(date_string, '%a, %d %b %Y %H:%M:%S %Z')
	return time.mktime(dt.timetuple())


def host_timestamp():
	"""Return number of seconds since the Unix epoch."""
	return time.time()


def record_timestamps():
	"""Record timestamps for both host and guest."""
	result = [(host_timestamp(), guest_timestamp())]
	while len(result) < 60:
		time.sleep(60.0)
		result.append((host_timestamp(), guest_timestamp()))
	return result


if __name__ == '__main__':
	timestamps = record_timestamps()
	# the guest time is already wrong immediately after boot,
	# set the host time as reference for both plots
	first_host_ts, first_guest_ts = timestamps[0]
	print('# Host_timestamp Guest_seconds')
	for ht, gt in timestamps:
		print(ht-first_host_ts, gt-first_guest_ts)

The guest time perfectly matches the host time when using acpi_pm or hpet as clock source, but not when using kvm-clock (which is the default):

host vs guest time

There seems to be no measurable benefit from using hpet in Linux guests, especially since it will generate more interrupts than acpi_pm.

Recommendations

I reached out to the VirtualBox core developers on IRC, explaining the effect I see and the two workarounds I discovered:

set the paravirtualization interface to legacy
explicitly set the clocksource to either acpi_pm or hpet

One of them told me that VirtualBox decides whether to use TSC based only on the processor model, without querying its capabilities; it’s a VirtualBox bug, but it only affects older processors, so it’s unlikely to be fixed. His advice was to set the paravirtualization interface to none on older CPUs, since they won’t get any benefits from other paravirtualization modes.

Back to posts