cancel
Showing results for 
Search instead for 
Did you mean: 

Benchmarking Multiple EBS Volumes running on an R4.8XLarge EC2 Instance in AWS

Community Manager Community Manager
Community Manager

Blog posting provided by our developer Trey Cahill  about EBS Volume testing.

 

 

Benchmarking Multiple EBS Volumes running on an R4.8XLarge EC2 Instance in AWS

 

As our data continues to grow we are continually being faced with how to scale to the data size at a tolerable cost.  One of our first ideas was to add more disk space to each node in our Hadoop cluster.  This would allow us to cram more data into fewer nodes, while attempting to maintain our current cluster performance.

Current Hadoop Environment

Our production Hadoop cluster currently uses R4.8XLarge EC2 instances with 4 1TB GP2 EBS volumes attached.  We chose the R4.8XLarge instance type due to our memory usage and the network performance of 10 Gigabit.  Our EBS volume choice was due to the maximum IOPS being achieved with a 1TB volume and any size larger than 1TB, having the same amount of IOPS.  Our primary workload is read heavy, with writes occurring in bursts several times a day.

Test Setup

To replicate our Hadoop cluster as closely as possible, I set up a single R4.8XLarge instance and added 4 1TB (1024 GB was specified during creation) volumes to start as a baseline for the tests.  Every volume, include those added in subsequent tests, used the ext4 filesystem and were all GP2 EBS volume types.  The volumes and EC2 instance were placed into the same AWS region and subnet.

Testing Process

With our 4 volumes attached to our EC2 instance, we would add 1 similar volume until we saw significant degradation.  Our final test, we added 2 volumes to see an even larger degradation, helping to confirm what we were seeing and would expect from adding 2 volumes.  We used fio (Flexible IO Tester) as our testing utility due to the verbosity of its output, the ability to extensively configure and control the tests, and the solid documentation and tutorials surrounding fio.

fio Testing

We ran 2 types of tests; a random read only test (see appendix C for the fio segment) and a random write only test (see appendix D for the fio segment).  Both tests used the same global settings (See appendix B).  Both tests would also operate on 500GB of data, while reading/writing in blocks of 128MB (once again, to replicate Hadoop settings; in this case block size).  Each volume in both test types would have 4 threads ran against it; so a test with 4 volumes would have 16 threads running.  Running 4 threads was not an arbitrary choice.  Our Hadoop HDFS setting, “dfs.datanode.handler.count” setting, which is the number of server threads for a Datanode, was set to 4.

 

Results

In both test sets, read and write, noticeable degradation started around 6 or 7 volumes added depending on the metric being reviewed.  We’ll do a deeper dive in the sections below.

 

Read Results

We can see that the runtimes increase for each run as volumes are added.  We see a significant increase for 8 volumes that seems to leave the nearly linear increase in average runtimes up until that point.  This could be due to any number of factors including the shared tenancy of the EC2 instances as the average runtime for 9 volumes appears to maintain the expected increase in runtime.  At 11 volumes we see a larger than expected jump, possibly indicating a threshold being reached.

 

1.PNG

 

As the number of volumes increases, we also see the average bandwidth of the disk decrease in a similar fashion to how the runtimes increased.  There does appear to be an unexplained increase in average bandwidth for 5 volumes, but this could be the node maximizing its resources. Finally, while there is a decrease in bandwidth for 11 volumes, it does not follow the runtimes pattern of being significantly smaller than expected.

 

2.PNG

For reads, more than 6 volumes appears to cause significant degradation in performance, although the degradation tends to follow a nearly linear decline.

 

Write Results

Once again, we see runtimes increase for each run as volumes are added; although when compared to writes, we don’t see any unexpected or out of turn increases in runtime.

 

3.PNG

Bandwidth for write tests are nearly identical to the read tests.  We even see a slight increase in average bandwidth for 5 volumes, followed by a nearly linear performance decrease.  The 11 volume test does appear to have a bit more than linear average bandwidth decrease that we’d expect to see, since we jumped from 9 to 11 volumes.  The read tests from 9 to 11 volumes does not show expected fall.

 

4.PNG

Once again, when running more than 6 volumes, the performance begins falling linearly.

 

Final Thoughts on the Results

Both the read and write tests mimic each other’s performance; with average bandwidth increasing with 5 volumes and runtimes increasing steadily after 6 volumes are ran against.  Below we see the average run time of reads and writes for comparison against each other.  As expected writes are slower than reads.

 

5.PNG

Recommendations for our Team

I recommended to the team that running 8 1TB volumes should provide a good mixture of Datanode disk density and performances.  Since our tests were ran with 4 threads per volume, the 8 volumes can be considered conservative due to our Datanode setting, dfs.datanode.handler.count, being set to 4.  This should prevent any performance degradation found when running our workloads against a cluster with 8 volumes. 

Improvements and Considerations

The 8 volume recommendation would probably be considered conservative by most; but the team can always add more volumes should we not see any performance issues with our workloads.  There may have been some problems with the test itself.  For example, CPU contention surrounding a higher number of threads was not monitored or taken into considered, with the exception of an overall reduction in I/Os performance.  Also, since AWS EBS volumes are connected via network, I did not collect any network statistics. Both of the previous considerations hopefully factored themselves into the equation naturally when running the tests.  Should there have been more time, I would have ran each test multiple times; unfortunately, this task was time boxed.  Finally, atime was not disabled as per Hadoop recommended settings, but since all tests were ran with atime, I feel that the results can stand as valid.

 

Appendixes

Appendix A: References

https://www.linux.com/learn/inspecting-disk-io-performance-fio - How to read FIO output / good examples

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/benchmark_procedures.html - AWS Benchmarking

https://tobert.github.io/post/2014-04-28-getting-started-with-fio.html - FIO Tutorial

https://tobert.github.io/post/2014-04-17-fio-output-explained.html - FIO output explained

https://www.datadoghq.com/blog/aws-ebs-latency-and-iops-the-surprising-truth/

https://linux.die.net/man/1/fio - FIO Man page

https://github.com/axboe/fio/blob/master/HOWTO - FIO github

http://tfindelkind.com/2015/08/10/fio-flexible-io-tester-part5-direct-io-or-buffered-page-cache-or-r... - Information on Invalidate and direct properties

http://www.n0derunner.com/2014/06/multiple-devicesjobs-in-fio/ - How to run a job into multiple directories

Appendix B: Random Read and Write Global Settings

[global]

# Use the given clocksource as the base of timing.

clocksource=clock_gettime

# Seed the random number generator in a predictable way so results are repeatable across runs.

randrepeat=0

# Limit run time to runtime seconds.

#runtime=180

# If this is provided, then the real offset becomes the offset + offset_increment * thread_number, where the thread number is a counter that starts at 0 and is incremented for each job.

#offset_increment=100g

# The invalidate option causes the kernel buffer and page cache to be invalidated for a file before beginning the benchmark.

# From Man Page: Invalidate buffer-cache for the file prior to starting I/O. Default: true.

invalidate=1

# Set Directory Where generated files will go. Used to place files in a location other than './'.

#directory=/fioTesting

# If true, use non-buffered I/O (usually O_DIRECT).

direct=1

Appendix C: Random Read Segment

This segment would be replicated for the number of volumes being tested.

 [Read-Test-1]

rw=randread

ioengine=libaio

# Number of I/O units to keep in flight against the file.

# dfs.datanode.handler.count=4; which is the number of server threads for the DataNode.

iodepth=4

size=500G

bs=128M

directory=/ebs0

Appendix D: Random Write Segment

This segment would be replicated for the number of volumes being tested.

 [Write-Test-1]

rw=randwrite

ioengine=libaio

# Number of I/O units to keep in flight against the file.

# dfs.datanode.handler.count=4; which is the number of server threads for the DataNode.

iodepth=4

size=500G

bs=128M

directory=/ebs0

Appendix E: Read Overall Test Results

E.PNG

Appendix F: Write Overall Test Results

F.PNG