RAM Speed (read and write) is crucial to the overall performance of the system. Having a fast cpu but slow ram can slow down a system in demanding work-loads like video editing and gaming.
Ram size and speed are different factors. Ram size determines how much data can be stored in the system, whereas the speed determines how fast that data can be accessed.
The cpu access ram for data very frequently in almost all kinds of processing. Slow ram would bottleneck the cpu and as a result cpu usage will remain low and you would be confused by isn't the system performing faster.
We already covered the topic of testing ram speed using tools like sysbench. Check this article:
How to Benchmark Ram Speed on Linux / Ubuntu / Fedora with SysbenchIn this post we are using a different tool from Intel called MLC (Memory Latency Checker). It is designed to check the memory latency but also reports the raw read-write speed at peak levels of data transfer speed. Its a simple command line tool, that requires no installation and runs both windows and linux.
MLC automatically uses all threads on all cores to maximise the memory throughput. This gives a good measurement for the maximum achievable bandwidth.
To conduct the tests shown below, we tested mlc on windows on 2 machines and ubuntu linux on 1 machine. Download Intel MLC here and extract in a folder. Next open a terminal and navigate to the folder so that you can execute the ml.exe file.
Its recommended to use the Powershell on windows, whereas on linux you can use any of your favorite terminal.
Testing different machines
So now lets test the mlc utility on different machines to test the ram read/write speed. Before running the tests we can calculate the theoretical bandwidth to compare against. This will give us an idea of how well configured the system is in terms of memory speed.
The theoretical max bandwidth of the ram can be calculated as follows:
DDR MT/s * Number of channels * Data bus width/8
Now since the data bus width is 64, the 3rd number of the equation is always 8 on 64-bit systems.
1. Machine 1: Asus TUF A17 Gaming
- Size: 16GB (8G+8G)
- Speed: 3200MT/s
- Timings: 22-22-22-52
- Channel: 2
- Theoretical Max B/W: 51200 MB/s
Simply run the mlc.exe binary inside the Windows folder without any arguments, and it will run a series of tests.
PS C:\Users\pugal\Downloads\mlc\Windows> .\mlc.exe Intel(R) Memory Latency Checker - v3.10 Measuring idle latencies for random access (in ns)... Numa node Numa node 0 0 94.9 Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 23367.7 3:1 Reads-Writes : 21757.7 2:1 Reads-Writes : 20026.1 1:1 Reads-Writes : 15764.0 Stream-triad like: 25578.0 Measuring Memory Bandwidths between nodes within system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Numa node Numa node 0 0 23715.4 Measuring Loaded Latencies for the system Using all the threads from each core if Hyper-threading is enabled Delay (ns) MB/sec ========================== 00000 836.69 23310.6 00002 838.19 23326.4 00008 863.45 23425.2 00015 834.06 23647.9 00050 742.16 24593.1 00100 742.35 25157.4 00200 193.63 27245.1 00300 146.65 19103.8 00400 137.07 14654.2 00500 130.99 11949.5 00700 129.35 8730.0 01000 128.31 6339.3 01300 128.12 4997.1 01700 127.07 3961.9 02500 128.36 2852.1 03500 132.95 2163.5 05000 129.58 1673.7 09000 130.89 1145.1 20000 127.82 797.6 Measuring cache-to-cache transfer latency (in ns)... Unable to enable large page allocation Using small pages for allocating buffers Local Socket L2->L2 HIT latency 31.6 Local Socket L2->L2 HITM latency 31.2 PS C:\Users\pugal\Downloads\mlc\Windows>
Look at the 2nd block from the top:
... Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 23367.7 3:1 Reads-Writes : 21757.7 2:1 Reads-Writes : 20026.1 1:1 Reads-Writes : 15764.0 Stream-triad like: 25578.0 ...
Theoretical Max B/W: 51200 MB/s. The Read b/w is around: 23,367 MB/s (45.63% of theoretical max) With 1:1 read-write operations the b/w drops to 15,764 MB/s (30.78% of theoretical max.)
This is unexpectedly lower, because on this particular laptop we have the 5800H ryzen cpu which has half the write speed compared to read speed.
Overall the ram performance of this laptop is very poor and not recommended at all.
2. Machine 2: UBuntu Desktop
- Ram: 32Gb (16G+16G)
- Speed: 2400 MT/s
- Timings: AA-RCD-RP-RAS (cycles) as DDR4-2400 16-16-16-39
- Channel: 2
- Theoretical B/W: 38,400 MB/s
enlightened@enlightened:~/Downloads/mlc_v3.10/Linux$ ./mlc Intel(R) Memory Latency Checker - v3.10 *** Unable to modify prefetchers (try executing 'modprobe msr') *** So, enabling random access for latency measurements Measuring idle latencies for random access (in ns)... Numa node Numa node 0 0 85.2 Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 28155.5 3:1 Reads-Writes : 27484.0 2:1 Reads-Writes : 27221.1 1:1 Reads-Writes : 27399.4 Stream-triad like: 26994.1 Measuring Memory Bandwidths between nodes within system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Numa node Numa node 0 0 28544.2 Measuring Loaded Latencies for the system Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Inject Latency Bandwidth Delay (ns) MB/sec ========================== 00000 133.39 25024.1 00002 130.43 25343.2 00008 129.68 25293.4 00015 150.43 25056.5 00050 118.81 21703.4 00100 97.42 14456.0 00200 86.92 9347.1 00300 92.37 6964.1 00400 83.97 5746.8 00500 84.61 4536.3 00700 84.46 3594.1 01000 80.22 2854.7 01300 81.62 2396.9 01700 90.70 1934.8 02500 90.10 1578.2 03500 88.81 1367.5 05000 84.87 1207.5 09000 81.67 1027.4 20000 85.05 864.0 Measuring cache-to-cache transfer latency (in ns)... Local Socket L2->L2 HIT latency 25.8 Local Socket L2->L2 HITM latency 30.7 enlightened@enlightened:~/Downloads/mlc_v3.10/Linux$
Note the peak test block 2nd from the top:
... Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 28155.5 3:1 Reads-Writes : 27484.0 2:1 Reads-Writes : 27221.1 1:1 Reads-Writes : 27399.4 Stream-triad like: 26994.1 ...
Theoretical B/W: 38,400 MB/s The read speed: 28155.5 (73.33% of theoretical max) 1:1 read-write speed: 27399.4 (71.35% of theoretical max)
The performance is not only decent, but surpasses the Asus A17 laptop with faster ram and tighter timings.
The read write speeds are over 70% of theoretical max which is great and the read write speeds are within 2% of each other which another indicator of stable performance.
3. Machine 3: Acer Swift 3 Laptop
- Ram: 16GB(8G +8G)
- Speed: 4267 MT/s
- Timings: 36-39-39-90
- Theoretical B/W: 68,272 MB/s
PS C:\Users\prashant\Downloads\mlc_v3.10\Windows> .\mlc.exe Intel(R) Memory Latency Checker - v3.10 Measuring idle latencies for random access (in ns)... Numa node Numa node 0 0 99.4 Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 60439.8 3:1 Reads-Writes : 59515.6 2:1 Reads-Writes : 55458.4 1:1 Reads-Writes : 46357.4 Stream-triad like: 60122.1 Measuring Memory Bandwidths between nodes within system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Numa node Numa node 0 0 58680.8 Measuring Loaded Latencies for the system Using all the threads from each core if Hyper-threading is enabled Using Read-only traffic type Inject Latency Bandwidth Delay (ns) MB/sec ========================== 00000 160.68 54042.0 00002 165.34 54034.4 00008 159.95 53055.1 00015 149.64 52889.8 00050 134.69 34161.3 00100 120.92 24743.9 00200 113.33 15570.3 00300 117.33 11054.4 00400 115.00 8710.7 00500 116.17 6870.4 00700 122.21 5316.8 01000 108.38 4217.8 01300 110.81 3366.8 01700 112.60 2692.5 02500 112.69 2023.6 03500 115.57 1598.8 05000 119.07 1248.5 09000 118.73 945.2 20000 121.29 713.4 Measuring cache-to-cache transfer latency (in ns)... Using small pages for allocating buffers Local Socket L2->L2 HIT latency 35.2 Local Socket L2->L2 HITM latency 34.6 PS C:\Users\prashant\Downloads\mlc_v3.10\Windows>
Looking at the peak injection section we can note the read write speed of the ram
Measuring Peak Injection Memory Bandwidths for the system Bandwidths are in MB/sec (1 MB/sec = 1,000,000 Bytes/sec) Using all the threads from each core if Hyper-threading is enabled Using traffic with the following read-write ratios ALL Reads : 60439.8 3:1 Reads-Writes : 59515.6 2:1 Reads-Writes : 55458.4 1:1 Reads-Writes : 46357.4 Stream-triad like: 60122.1
Theoretical B/W: 68,272 MB/s The read speed: 60439 MB/s (88.5% of theoretical max) The 1:1 read-write speed: 46357 MB/s (67.9% of theoretical max)
The memory read speed is extraordinary and makes the laptop really fast and smooth for regular applications like web browsing and document editing with no lag ever. Ram performance is key to the overall system speed.
It seems like the ram is slightly overclocked on this particular laptop, as such high speed are usually not seen on all laptops.
Note that this laptop's memory performs a lot better than the asus tuf a17 laptop inspite of that being a more expensive gaming laptop and this one being a budget lightweight notebook.
Factors affecting Ram speed
In theory the ram speed should be totally determined by the clock speed of the ram module and the timing configuration. However this is not the case in actual systems. The ram speed is how fast the ram module (dimm) can operate in theory, whereas the memory speed is how fast the system is capable of running the ram.
The actual memory speed will always be less than the rated ram speed for couple of reasons.
1. Single vs Dual channel - A single ram module in single channel module will always be slower than multiple sticks in dual channel mode. The difference in performance is significant in applications that are sensitive to memory speed.
2. Single vs Dual Rank - The rank configuration of the memory chips on a ram module can further affect the performance. Dual rank memory will always perform better than single rank memory modules.
2. The cpu and its memory controller may also affect the overall memory speed. Certain cpus like the Zen3 architecture based cpus (5800H for example) have half the write speed of what the ram is actually capable of. Therefore the choice of your cpu can also affect the performance of the memory.
To learn more about ram and its internal working check out this article:
Affects of RAM Speed
Ram speed affects applications that are sensitive to memory operation speed. Some examples are games, video editing preview playback, cpu based rendering etc. Any application that needs to read/write data frequently will suffer slow-down if the ram itself is slow.
That is why we have special high speed ram available for gaming builds, that are designed to have lower latency timings and higher clock speed.
Conclusion
If you want a more accurate diagnosis of the ram speed performance you can try out tools like Passmark memtest86 which is a boot utility program that runs before the operating system and runs raw tests on the hardware with no other application load present. We shall cover it in another article.