Cloud server providers often advertise their instances as having a number of “vCPUs”, short for virtual CPU. What performance can you expect from this compared to a normal CPU?
The difference between cores and threads
It is important to understand the distinction between a processing thread and a processor core. The processors will have a defined number of cores which manage the execution of the programs. But even very intensive tasks don’t use 100% of the CPU all the time; programs often have to wait for memory reads from the L3 cache, RAM and disks, and will often go to sleep while waiting for the data to arrive. During this time, the processor core is inactive.
The solution to this problem is called “hyperthreading” or “simultaneous multithreading”. Rather than running one set of tasks at a time, the processor is able to handle multiple threads. Currently, almost all high-end processors from Intel or AMD support two threads per core.
Depending on the application, hyperthreading can give a theoretical acceleration of 100%, if the two threads are waiting for memory readings and are not in conflict with each other. In most cases, hyperthreading gives a speed gain of around 30% compared to the absence of hyperthreading. In some cases, however, when two threads are 100% pinned and running on the same core, it can cause slowdowns as they fight for processor resources.
What makes a vCPU?
VCPUs are roughly comparable to a single processing thread, but this is not exactly a fair comparison.
Suppose you rent a c5.large instance of AWS with 2 vCPUs. Your application will run with many others on a large server. You can actually rent the whole server with one AWS Bare Metal example, which gives you direct access to the processor. If you rent something smaller than this, your access is managed through the AWS Nitro.
Nitro is a hypervisor, which manages the creation and management of virtual machines running on the server itself. This is why you are renting a “virtual server” and not a rack in a data center. Nitro is what makes EC2 work; it is powered in part by dedicated hardware, so the slowdown in execution in a virtualized environment should be minimal.
Nitro decides which threads to assign your virtual machine to based on the processing power required, much like a task scheduler in a normal office environment. With 2 virtual processors, the worst case is that your application runs on a single core and receives the two threads from this core. If you really maximize your instance, your threads can conflict and cause minor slowdowns. It’s hard to say exactly how the AWS hypervisor works, but it’s probably safe to assume that this scenario is largely mitigated with good thread handling from Nitro.
So, overall, you can probably expect performance comparable to a normal CPU thread, if not a little better. The distinction doesn’t matter much anyway, as most EC2 instances will ship with multiples of 2 vCPUs. Remember that a 4 vCPU instance is not a 4-core server – it really emulates a 2-core server, running 4 processing threads.
The processing speed of vCPU will depend more on the actual hardware it is running on. Most server processors will be Intel Xeons, as they make up the majority of the market. Low end servers can run older hardware which is a bit dated by today’s standards. AWS T3a instances use AMD EPYC processors with a high number of cores, operate a little slower, but cost less because the hardware is much cheaper by heart.
AWS T2 and T3 instances are “burstable”, which is best suited for applications that don’t need to be running 100% all the time.
For example, the t3.micro instance has 2 vCPUs, but its base speed is 10% of a normal vCPU. In reality, the t3.micro really only has 0.2 vCPU, which is actually the way Google Cloud Platform announces its f1-micro instances.
But the t3.micro is not only 90% slower overall; it is okay to burst beyond basic speed for short periods of time, much like running the turbo frequency on a regular computer. Except that the limiting factor here is not thermal, but how much you are willing to pay.
For each hour the instance runs below base speed, you accumulate CPU credits, which are used to burst the instance for one minute. The t3.micro instance in particular accumulates 6 CPU credits per hour which it rotates below the basic speed. But when processing power is needed, CPU credits are consumed to exceed base speed.
This is well suited for microservice-based applications, which must respond to requests when they occur, but remain inactive until the next user requests something. The services that have to crunch the numbers all the time are better suited to traditional servers.
This allows AWS to run more T2 instances per server than the server would normally be able to do, which saves costs. For example, each rack in their data center can contain a 48-core system with 96 processing threads. This could be used for 96 vCPUs of C5 instances, but T2 instances are able to share cores and operate at less than 20% of base base speed, so AWS can run more of them from the same server.