dimanche 8 mars 2015

Speed variation between vCPUs on the same Amazon EC2 instance

I'm exploring the feasibility of running numerical computations on Amazon EC2. I currently have one c4.8xlarge instance running. It has 36 vCPUs, each of which is a hyperthread of a Haswell Xeon chip. The instance runs Ubuntu in HVM mode.


I have a GCC-optimised binary of a completely sequential (i.e. single-threaded) program. I launched 30 instances with CPU-pinning thus:



for i in `seq 0 29`; do
nohup taskset -c $i $BINARY_PATH &> $i.out &
done


The 30 processes run almost identical calculations. There's very little disk activity (a few megabytes every 5 minutes), and there's no network activity or interprocess communication whatsoever. htop reports that all processes run constantly at 100%.


The whole thing has been running for about 4 hours at this point. Six processes (12-17) have already done their task, while processes 0, 20, 24 and 29 look as if they will require another 4 hours to complete. Other processes fall somewhere in between.


My questions are:



  1. Does the significant variation in performance between the vCPUs within the same instance arise entirely from resource contention with other users? As it stands, the instance would be entirely unsuitable for any OpenMP or MPI jobs that synchronise between threads/ranks.

  2. Is there anything I can do to achieve a more uniform (hopefully higher) performance across the cores? I have basically excluded hyperthreading as a culprit here since the six "fast" processes are hyperthreads on the same physical cores. Perhaps there's some NUMA-related issue?





Aucun commentaire:

Enregistrer un commentaire