Benchmark RAM speed with Intel MLC - Test Read-Write Speed

RAM Speed (read and write) is crucial to the overall performance of the system. Having a fast cpu but slow ram can slow down a system in demanding work-loads like video editing and gaming.

Ram size and speed are different factors. Ram size determines how much data can be stored in the system, whereas the speed determines how fast that data can be accessed.

The cpu access ram for data very frequently in almost all kinds of processing. Slow ram would bottleneck the cpu and as a result cpu usage will remain low and you would be confused by isn't the system performing faster.

We already covered the topic of testing ram speed using tools like sysbench. Check this article:

How to Benchmark Ram Speed on Linux / Ubuntu / Fedora with Sysbench

In this post we are using a different tool from Intel called MLC (Memory Latency Checker). It is designed to check the memory latency but also reports the raw read-write speed at peak levels of data transfer speed. Its a simple command line tool, that requires no installation and runs both windows and linux.

MLC automatically uses all threads on all cores to maximise the memory throughput. This gives a good measurement for the maximum achievable bandwidth.

To conduct the tests shown below, we tested mlc on windows on 2 machines and ubuntu linux on 1 machine. Download Intel MLC here and extract in a folder. Next open a terminal and navigate to the folder so that you can execute the ml.exe file.

Its recommended to use the Powershell on windows, whereas on linux you can use any of your favorite terminal.

Testing different machines

So now lets test the mlc utility on different machines to test the ram read/write speed. Before running the tests we can calculate the theoretical bandwidth to compare against. This will give us an idea of how well configured the system is in terms of memory speed.

The theoretical max bandwidth of the ram can be calculated as follows:

DDR MT/s * Number of channels * Data bus width/8

Now since the data bus width is 64, the 3rd number of the equation is always 8 on 64-bit systems.

1. Machine 1: Asus TUF A17 Gaming

Size: 16GB (8G+8G)
Speed: 3200MT/s
Timings: 22-22-22-52
Channel: 2
Theoretical Max B/W: 51200 MB/s

Simply run the mlc.exe binary inside the Windows folder without any arguments, and it will run a series of tests.

PS C:\Users\pugal\Downloads\mlc\Windows> .\mlc.exe
Intel(R) Memory Latency Checker - v3.10
Measuring idle latencies for random access (in ns)...
                Numa node
Numa node            0
       0          94.9

Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads        :      23367.7
3:1 Reads-Writes :      21757.7
2:1 Reads-Writes :      20026.1
1:1 Reads-Writes :      15764.0
Stream-triad like:      25578.0

Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
                Numa node
Numa node            0
       0        23715.4

Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Delay   (ns)    MB/sec
==========================
 00000  836.69    23310.6
 00002  838.19    23326.4
 00008  863.45    23425.2
 00015  834.06    23647.9
 00050  742.16    24593.1
 00100  742.35    25157.4
 00200  193.63    27245.1
 00300  146.65    19103.8
 00400  137.07    14654.2
 00500  130.99    11949.5
 00700  129.35     8730.0
 01000  128.31     6339.3
 01300  128.12     4997.1
 01700  127.07     3961.9
 02500  128.36     2852.1
 03500  132.95     2163.5
 05000  129.58     1673.7
 09000  130.89     1145.1
 20000  127.82      797.6

Measuring cache-to-cache transfer latency (in ns)...
Unable to enable large page allocation
Using small pages for allocating buffers
Local Socket L2->L2 HIT  latency        31.6
Local Socket L2->L2 HITM latency        31.2
PS C:\Users\pugal\Downloads\mlc\Windows>

Look at the 2nd block from the top:

...
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads        :      23367.7
3:1 Reads-Writes :      21757.7
2:1 Reads-Writes :      20026.1
1:1 Reads-Writes :      15764.0
Stream-triad like:      25578.0
...

Theoretical Max B/W: 51200 MB/s.
The Read b/w is around: 23,367 MB/s (45.63% of theoretical max)
With 1:1 read-write operations the b/w drops to 15,764 MB/s (30.78% of theoretical max.)

This is unexpectedly lower, because on this particular laptop we have the 5800H ryzen cpu which has half the write speed compared to read speed.
Overall the ram performance of this laptop is very poor and not recommended at all.

2. Machine 2: UBuntu Desktop

Ram: 32Gb (16G+16G)
Speed: 2400 MT/s
Timings: AA-RCD-RP-RAS (cycles) as DDR4-2400 16-16-16-39
Channel: 2
Theoretical B/W: 38,400 MB/s

enlightened@enlightened:~/Downloads/mlc_v3.10/Linux$ ./mlc
Intel(R) Memory Latency Checker - v3.10
*** Unable to modify prefetchers (try executing 'modprobe msr')
*** So, enabling random access for latency measurements
Measuring idle latencies for random access (in ns)...
                Numa node
Numa node            0
       0          85.2

Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads        :      28155.5
3:1 Reads-Writes :      27484.0
2:1 Reads-Writes :      27221.1
1:1 Reads-Writes :      27399.4
Stream-triad like:      26994.1

Measuring Memory Bandwidths between nodes within system 
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
                Numa node
Numa node            0
       0        28544.2

Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject  Latency Bandwidth
Delay   (ns)    MB/sec
==========================
 00000  133.39    25024.1
 00002  130.43    25343.2
 00008  129.68    25293.4
 00015  150.43    25056.5
 00050  118.81    21703.4
 00100   97.42    14456.0
 00200   86.92     9347.1
 00300   92.37     6964.1
 00400   83.97     5746.8
 00500   84.61     4536.3
 00700   84.46     3594.1
 01000   80.22     2854.7
 01300   81.62     2396.9
 01700   90.70     1934.8
 02500   90.10     1578.2
 03500   88.81     1367.5
 05000   84.87     1207.5
 09000   81.67     1027.4
 20000   85.05      864.0

Measuring cache-to-cache transfer latency (in ns)...
Local Socket L2->L2 HIT  latency        25.8
Local Socket L2->L2 HITM latency        30.7
enlightened@enlightened:~/Downloads/mlc_v3.10/Linux$

Note the peak test block 2nd from the top:

...
Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads        :      28155.5
3:1 Reads-Writes :      27484.0
2:1 Reads-Writes :      27221.1
1:1 Reads-Writes :      27399.4
Stream-triad like:      26994.1
...

Theoretical B/W: 38,400 MB/s
The read speed: 28155.5 (73.33% of theoretical max) 
1:1 read-write speed: 27399.4 (71.35% of theoretical max)

The performance is not only decent, but surpasses the Asus A17 laptop with faster ram and tighter timings.

The read write speeds are over 70% of theoretical max which is great and the read write speeds are within 2% of each other which another indicator of stable performance.

3. Machine 3: Acer Swift 3 Laptop

Ram: 16GB(8G +8G)
Speed: 4267 MT/s
Timings: 36-39-39-90
Theoretical B/W: 68,272 MB/s

PS C:\Users\prashant\Downloads\mlc_v3.10\Windows> .\mlc.exe
Intel(R) Memory Latency Checker - v3.10
Measuring idle latencies for random access (in ns)...
                Numa node
Numa node            0
       0          99.4

Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads        :      60439.8
3:1 Reads-Writes :      59515.6
2:1 Reads-Writes :      55458.4
1:1 Reads-Writes :      46357.4
Stream-triad like:      60122.1

Measuring Memory Bandwidths between nodes within system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
                Numa node
Numa node            0
       0        58680.8

Measuring Loaded Latencies for the system
Using all the threads from each core if Hyper-threading is enabled
Using Read-only traffic type
Inject  Latency Bandwidth
Delay   (ns)    MB/sec
==========================
 00000  160.68    54042.0
 00002  165.34    54034.4
 00008  159.95    53055.1
 00015  149.64    52889.8
 00050  134.69    34161.3
 00100  120.92    24743.9
 00200  113.33    15570.3
 00300  117.33    11054.4
 00400  115.00     8710.7
 00500  116.17     6870.4
 00700  122.21     5316.8
 01000  108.38     4217.8
 01300  110.81     3366.8
 01700  112.60     2692.5
 02500  112.69     2023.6
 03500  115.57     1598.8
 05000  119.07     1248.5
 09000  118.73      945.2
 20000  121.29      713.4

Measuring cache-to-cache transfer latency (in ns)...
Using small pages for allocating buffers
Local Socket L2->L2 HIT  latency        35.2
Local Socket L2->L2 HITM latency        34.6
PS C:\Users\prashant\Downloads\mlc_v3.10\Windows>

Looking at the peak injection section we can note the read write speed of the ram

Measuring Peak Injection Memory Bandwidths for the system
Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec)
Using all the threads from each core if Hyper-threading is enabled
Using traffic with the following read-write ratios
ALL Reads        :      60439.8
3:1 Reads-Writes :      59515.6
2:1 Reads-Writes :      55458.4
1:1 Reads-Writes :      46357.4
Stream-triad like:      60122.1

Theoretical B/W: 68,272 MB/s
The read speed: 60439 MB/s (88.5% of theoretical max)
The 1:1 read-write speed: 46357 MB/s (67.9% of theoretical max)

The memory read speed is extraordinary and makes the laptop really fast and smooth for regular applications like web browsing and document editing with no lag ever. Ram performance is key to the overall system speed.

It seems like the ram is slightly overclocked on this particular laptop, as such high speed are usually not seen on all laptops.

Note that this laptop's memory performs a lot better than the asus tuf a17 laptop inspite of that being a more expensive gaming laptop and this one being a budget lightweight notebook.

Factors affecting Ram speed

In theory the ram speed should be totally determined by the clock speed of the ram module and the timing configuration. However this is not the case in actual systems. The ram speed is how fast the ram module (dimm) can operate in theory, whereas the memory speed is how fast the system is capable of running the ram.

The actual memory speed will always be less than the rated ram speed for couple of reasons.

1. Single vs Dual channel - A single ram module in single channel module will always be slower than multiple sticks in dual channel mode. The difference in performance is significant in applications that are sensitive to memory speed.

2. Single vs Dual Rank - The rank configuration of the memory chips on a ram module can further affect the performance. Dual rank memory will always perform better than single rank memory modules.

2. The cpu and its memory controller may also affect the overall memory speed. Certain cpus like the Zen3 architecture based cpus (5800H for example) have half the write speed of what the ram is actually capable of. Therefore the choice of your cpu can also affect the performance of the memory.

To learn more about ram and its internal working check out this article:

Affects of RAM Speed

Ram speed affects applications that are sensitive to memory operation speed. Some examples are games, video editing preview playback, cpu based rendering etc. Any application that needs to read/write data frequently will suffer slow-down if the ram itself is slow.

That is why we have special high speed ram available for gaming builds, that are designed to have lower latency timings and higher clock speed.

Conclusion

If you want a more accurate diagnosis of the ram speed performance you can try out tools like Passmark memtest86 which is a boot utility program that runs before the operating system and runs raw tests on the hardware with no other application load present. We shall cover it in another article.