抑郁症健康 > unixbench测试CPU性能工具／mbw测试内存

unixbench测试CPU性能工具／mbw测试内存

时间：2024-08-06 05:30:11

相关推荐

一、unixbench工具

UnixBench是一个类unix系（Unix，BSD，Linux）统下的性能测试工具，一个开源工具，被广泛用与测试linux系统主机的性能。Unixbench的主要测试项目有：系统调用、读写、进程、图形化测试、2D、

3D、管道、运算、C库等系统基准性能提供测试数据。

unixbench一个基于系统的基准测试工具，不单纯是CPU 内存或者磁盘测试工具。测试结果不仅仅取决于硬件，也取决于系统、开发库、甚至是编译器。

测试项目

Dhrystone测试

测试聚焦在字符串处理，没有浮点运算操作。这个测试用于测试链接器编译、代码优化、内存缓存、等待状态、整数数据类型等，硬件和软件设计都会非常大的影响测试结果。

Whetstone 测试

这项测试项目用于测试浮点运算效率和速度。这项测试项目包含若干个科学计算的典型性能模块，包含大量的C语言函数,sin cos sqrt exp和日志以及使用整数和浮点的数学操作。包含数组访问、条件分支和过程调用。

Execl Throughput（execl 吞吐，这里的execl是类unix系统非常重要的函数，非办公软件的execl）测试

这项测试测试每秒execl函数调用次数。execl是 exec函数家族的一部分，使用新的图形处理代替当前的图形处理。有许多命令和前端的execve()函数命令非常相似。

File Copy测试

这项测试衡量文件数据从一个文件被传输到另外一个，使用大量的缓存。包括文件的读、写、复制测试，测试指标是一定时间内（默认是10秒）被重写、读、复制的字符数量。

Pipe Throughput（管道吞吐）测试

pipe是简单的进程之间的通讯。管道吞吐测试是测试在一秒钟一个进程写512比特到一个管道中并且读回来的次

数。管道吞吐测试和实际编程有差距。

Pipe-based Context Switching （基于管道的上下文交互）测试

这项测试衡量两个进程通过管道交换和整数倍的增加吞吐的次数。基于管道的上下文切换和真实程序很类似。测试程序产生一个双向管道通讯的子线程。

Process Creation(进程创建)测试

这项测试衡量一个进程能产生子线程并且立即退出的次数。新进程真的创建进程阻塞和内存占用，所以测试程序直接使用内存带宽。这项测试用于典型的比较大量的操作系统进程创建操作。

Shell Scripts测试

shell脚本测试用于衡量在一分钟内，一个进程可以启动并停止shell脚本的次数，通常会测试1，2， 3， 4， 8 个shell脚本的共同拷贝，shell脚本是一套转化数据文件的脚本。

System Call Overhead （系统调用消耗）测试

这项测试衡量进入和离开系统内核的消耗，例如，系统调用的消耗。程序简单重复的执行getpid调用（返回调用的进程id）。消耗的指标是调用进入和离开内核的执行时间。

Graphical Tests（图形）测试

由”ubgears”程序组成，测试非常粗的2D和3D图形性能，尤其是3D测试非常有限。测试结果和硬件，系统合适的驱动关系很大。

安装使用

wget /test/unixbench/unixbench-5.1.2.tar.gztar zxvf unixbench-5.1.2.tar.gzcd unixbench-5.1.2

阅读README文件，得知如果不需要进行图形测试或者不在图形化界面下测试，则将Makefile文件中GRAPHICS_TEST = defined注释掉

make

运行

./Run

今天有在金山云服务器跑分的时候出现”Can’t locate Time/HiRes.pm in @INC”错误提示无法进行，检测是出现缺少perl Time HiRes组件造成的，并不是所有的UnixBench跑分的时候都会遇到这样的问题。

更新组件

【yum -y install perl-Time-HiRes】

然后等待程序进行测试即可,这个过程可能比较漫长,请耐心等待

对于ubuntu系统安装也会有报错

gcc -o ./pgms/ubgears -DTIME -Wall -pedantic -ansi -O2 -fomit-frame-pointer -fforce-addr -ffast-math -Wall ./src/ubgears.c -lGL -lXext -lX11./src/ubgears.c:51:19: error: GL/gl.h: No such file or directory./src/ubgears.c:52:20: error: GL/glx.h: No such file or directory./src/ubgears.c:129: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'view_rotx'...省略./src/ubgears.c:632: error: 'GL_RENDERER' undeclared (first use in this function)./src/ubgears.c:633: error: 'GL_VERSION' undeclared (first use in this function)./src/ubgears.c:634: error: 'GL_VENDOR' undeclared (first use in this function)./src/ubgears.c:635: error: 'GL_EXTENSIONS' undeclared (first use in this function)./src/ubgears.c:643: warning: implicit declaration of function 'glXDestroyContext'make: *** [pgms/ubgears] Error 1</p><p>**********************************************Run: "make all" failed; aborting

解决方法：

apt-get install libxext-dev libgl1-mesa-dev通过查阅资料,由于ubgears.c中会用到数学函数,而实际运行时找不到对应的数学函数,只需要在显示调用函数函数库即可,在Makefile中GL_LIBS 后添加-lm

对于运行结果，说明如下：

看到run文件后，输入 ./Run 执行命令对VPS进行性能测试就开始了，最后跑完将会有一个分数在底部出现。通常情况下1000分以上的VPS是性能较好的。

这里写图片描述

========================================================================BYTE UNIX Benchmarks (Version 5.1.2)System: VM-0-8-ubuntu: GNU/LinuxOS: GNU/Linux -- 4.4.0-91-generic -- #114-Ubuntu SMP Tue Aug 8 11:56:56 UTC Machine: x86_64 (x86_64)Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")CPU 0: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRETCPU 1: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRETCPU 2: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRETCPU 3: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRETCPU 4: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRETCPU 5: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRETCPU 6: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRETCPU 7: Intel(R) Xeon(R) CPU E5-26xx v4 (4800.0 bogomips)Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET11:26:56 up 22 min, 1 user, load average: 0.07, 0.07, 0.17; runlevel 5------------------------------------------------------------------------Benchmark Run: Mon Apr 16 11:26:56 - 11:55:178 CPUs in system; running 1 parallel copy of testsDhrystone 2 using register variables 33444509.7 lps (10.0 s, 7 samples)Double-Precision Whetstone 2702.2 MWIPS (10.0 s, 7 samples)Execl Throughput 4647.2 lps (29.9 s, 2 samples)File Copy 1024 bufsize 2000 maxblocks 1131210.2 KBps (30.0 s, 2 samples)File Copy 256 bufsize 500 maxblocks306139.4 KBps (30.0 s, 2 samples)File Copy 4096 bufsize 8000 maxblocks 3477545.6 KBps (30.0 s, 2 samples)Pipe Throughput 2197189.4 lps (10.0 s, 7 samples)Pipe-based Context Switching 159896.2 lps (10.0 s, 7 samples)Process Creation11912.9 lps (30.0 s, 2 samples)Shell Scripts (1 concurrent) 12619.4 lpm (60.0 s, 2 samples)Shell Scripts (8 concurrent) 5086.8 lpm (60.0 s, 2 samples)System Call Overhead 3928781.6 lps (10.0 s, 7 samples)System Benchmarks Index ValuesBASELINE RESULT INDEXDhrystone 2 using register variables 116700.0 33444509.7 2865.9Double-Precision Whetstone 55.0 2702.2 491.3Execl Throughput 43.0 4647.2 1080.7File Copy 1024 bufsize 2000 maxblocks3960.0 1131210.2 2856.6File Copy 256 bufsize 500 maxblocks 1655.0306139.4 1849.8File Copy 4096 bufsize 8000 maxblocks5800.0 3477545.6 5995.8Pipe Throughput 12440.0 2197189.4 1766.2Pipe-based Context Switching 4000.0159896.2 399.7Process Creation 126.011912.9 945.5Shell Scripts (1 concurrent) 42.412619.4 2976.3Shell Scripts (8 concurrent) 6.0 5086.8 8478.1System Call Overhead15000.0 3928781.6 2619.2========System Benchmarks Index Score1893.7------------------------------------------------------------------------Benchmark Run: Mon Apr 16 11:55:17 - 12:23:398 CPUs in system; running 8 parallel copies of testsDhrystone 2 using register variables263391605.6 lps (10.0 s, 7 samples)Double-Precision Whetstone21623.4 MWIPS (10.0 s, 7 samples)Execl Throughput32726.1 lps (29.9 s, 2 samples)File Copy 1024 bufsize 2000 maxblocks 1117467.1 KBps (30.0 s, 2 samples)File Copy 256 bufsize 500 maxblocks304340.2 KBps (30.0 s, 2 samples)File Copy 4096 bufsize 8000 maxblocks 3570594.5 KBps (30.0 s, 2 samples)Pipe Throughput 17497194.7 lps (10.0 s, 7 samples)Pipe-based Context Switching1783119.9 lps (10.0 s, 7 samples)Process Creation58313.8 lps (30.0 s, 2 samples)Shell Scripts (1 concurrent) 60188.9 lpm (60.2 s, 2 samples)Shell Scripts (8 concurrent) 8246.3 lpm (60.2 s, 2 samples)System Call Overhead 6898602.7 lps (10.0 s, 7 samples)System Benchmarks Index ValuesBASELINE RESULT INDEXDhrystone 2 using register variables 116700.0 263391605.6 22570.0Double-Precision Whetstone 55.021623.4 3931.5Execl Throughput 43.032726.1 7610.7File Copy 1024 bufsize 2000 maxblocks3960.0 1117467.1 2821.9File Copy 256 bufsize 500 maxblocks 1655.0304340.2 1838.9File Copy 4096 bufsize 8000 maxblocks5800.0 3570594.5 6156.2Pipe Throughput 12440.0 17497194.7 14065.3Pipe-based Context Switching 4000.0 1783119.9 4457.8Process Creation 126.058313.8 4628.1Shell Scripts (1 concurrent) 42.460188.9 14195.5Shell Scripts (8 concurrent) 6.0 8246.3 13743.8System Call Overhead15000.0 6898602.7 4599.1========System Benchmarks Index Score6493.2

注：上面会有两个跑分结果，一个是 1 parallel process 的结果，另一个是4 parallel process 的结果（具体可以看html里的输出）。两者的区别即一个是单进程跑，一个是多进程跑。

默认测试完成后测试结果会存放在results目录，如下：

root@VM-16-16-ubuntu:/home/ubuntu/unixbench-5.1.2/results# pwd/home/ubuntu/unixbench-5.1.2/resultsroot@VM-16-16-ubuntu:/home/ubuntu/unixbench-5.1.2/results# lsVM-16-16-ubuntu--04-16-01 VM-16-16-ubuntu--04-16-01.html VM-16-16-ubuntu--04-16-01.log

测试项目分析

测试过程中每个项目后面会有1 2 3 4 5 6 7 8 9 10 数字，意思是进行了10组测试，测试过程中部分内容及解释如下：

*************

Dhrystone 2 using register variables 1 2 3 4 5 6 7 8 9 10

此项产生于 1984，测试 string handling，因为没有浮点操作，所以深受软件和硬件设计（hardware and software design）、编译和链接（compiler and linker options）、代码优化（code optimazaton）、对内存的cache（cache memory）、等待状态（？wait states）整数数据类型（integer data types）的影响**********

Double-Precision Whetstone 1 2 3 4 5 6 7 8 9 10

这一项测试浮点数操作的速度和效率。这一测试包括几个模块，每个模块都包括一组用于科学计算的操作。覆盖面很广的一系列c函数：sin，cos，sqrt，exp，log 被用于整数和浮点数的数学运算、数组访问、条件分支（conditional branch）和程序调用。此测试同时测试了整数和浮点数算术运算。

System Call Overhead 1 2 3 4 5 6 7 8 9 10

测试进入和离开操作系统内核的代价，即一次系统调用的代价。它利用一个反复地调用 getpid 函数的小程序达到此目的。

Pipe Throughput 1 2 3 4 5 6 7 8 9 10

管道（pipe）是进程间交流的最简单方式，这里的 Pipe throughtput 指的是一秒钟内一个进程可以向一个管道写 512 字节数据然后再读回的次数。需要注意的是，pipe throughtput 在实际编程中没有对应的真实存在。

Pipe-based Context Switching 1 2 3 4 5 6 7 8 9 10

这个测试两个进程（每秒钟）通过一个管道交换一个不断增长的整数的次数。这一点很向现实编程中的一些应用，这个测试程序首先创建一个子进程，再和这个子进程进行双向的管道传输。

Process Creation 1 2 3

测试每秒钟一个进程可以创建子进程然后收回子进程的次数（子进程一定立即退出）。process creation 的关注点是新进程进程控制块（process control block）的创建和内存分配，即一针见血地关注内存带宽。一般说来，这个测试被用于对操作系统进程创建这一系统调用的不同实现的比较。

Execl Throughput 1 2 3

此测试考察每秒钟可以执行的 execl 系统调用的次数。 execl 系统调用是 exec 函数族的一员。它和其他一些与之相似的命令一样是 execve（）函数的前端。

File copy

测试从一个文件向另外一个文件传输数据的速率。每次测试使用不同大小的缓冲区。这一针对文件 read、write、copy 操作的测试统计规定时间（默认是 10s）内的文件 read、write、copy 操作次数。

Filesystem Throughput 1024 bufsize 2000 maxblocks 1 2 3

Filesystem Throughput 256 bufsize 500 maxblocks 1 2 3

Filesystem Throughput 4096 bufsize 8000 maxblocks 1 2 3

****

Shell Scripts

测试一秒钟内一个进程可以并发地开始一个 shell 脚本的 n 个拷贝的次数，n 一般取值 1，2，4，8.（我的系统上取 1， 8， 16）。这个脚本对一个数据文件进行一系列的变形操作（?transformation）。

Shell Scripts (1 concurrent) 1 2 3

Shell Scripts (8 concurrent) 1 2 3

Shell Scripts (16 concurrent) 1 2 3

对于多cpu系统的性能测试策略，需要统计单任务,多任务及其并行的性能增强。

以4个cpu的PC为例，需要测试两次，4个CPU就是要并行执行4个copies，

【Run -c 1 -c 4】表示执行两次，第一次单个copies,第二次4个copies的测试任务。

结果分析

【System Benchmarks Index Score 171.3】

【System Benchmarks Index Score 395.7】

二、mbw工具

mbw为测试主机mem性能工具

测试主机配置为16C、32G，系统盘为50G普通云盘，数据盘为100GSSD云盘

系统为ubuntu16.04.1

root@VM-0-15-ubuntu:/home/ubuntu# apt install -y mbwroot@VM-0-15-ubuntu:/home/ubuntu# mbw -q -n 10 256Long uses 8 bytes. Allocating 2*4194304 elements = 67108864 bytes of memory.Using 262144 bytes as blocks for memcpy block copy test.Getting down to business... Doing 10 runs per test.0 Method: MEMCPY Elapsed: 0.00646 MiB: 32.00000 Copy: 4955.094 MiB/s1 Method: MEMCPY Elapsed: 0.00662 MiB: 32.00000 Copy: 4833.107 MiB/s2 Method: MEMCPY Elapsed: 0.00655 MiB: 32.00000 Copy: 4882.514 MiB/s3 Method: MEMCPY Elapsed: 0.00652 MiB: 32.00000 Copy: 4910.988 MiB/s4 Method: MEMCPY Elapsed: 0.00683 MiB: 32.00000 Copy: 4685.898 MiB/s5 Method: MEMCPY Elapsed: 0.00651 MiB: 32.00000 Copy: 4918.537 MiB/s6 Method: MEMCPY Elapsed: 0.00652 MiB: 32.00000 Copy: 4909.481 MiB/s7 Method: MEMCPY Elapsed: 0.00654 MiB: 32.00000 Copy: 4891.470 MiB/s8 Method: MEMCPY Elapsed: 0.00657 MiB: 32.00000 Copy: 4870.624 MiB/s9 Method: MEMCPY Elapsed: 0.00653 MiB: 32.00000 Copy: 4901.961 MiB/sAVGMethod: MEMCPY Elapsed: 0.00656 MiB: 32.00000 Copy: 4874.928 MiB/s0 Method: DUMB Elapsed: 0.00400 MiB: 32.00000 Copy: 8004.002 MiB/s1 Method: DUMB Elapsed: 0.00278 MiB: 32.00000 Copy: 11510.791 MiB/s2 Method: DUMB Elapsed: 0.00280 MiB: 32.00000 Copy: 11444.921 MiB/s3 Method: DUMB Elapsed: 0.00287 MiB: 32.00000 Copy: 11145.942 MiB/s4 Method: DUMB Elapsed: 0.00286 MiB: 32.00000 Copy: 11180.992 MiB/s5 Method: DUMB Elapsed: 0.00290 MiB: 32.00000 Copy: 11045.910 MiB/s6 Method: DUMB Elapsed: 0.00286 MiB: 32.00000 Copy: 11192.725 MiB/s7 Method: DUMB Elapsed: 0.00278 MiB: 32.00000 Copy: 11527.378 MiB/s8 Method: DUMB Elapsed: 0.00277 MiB: 32.00000 Copy: 11569.053 MiB/s9 Method: DUMB Elapsed: 0.00278 MiB: 32.00000 Copy: 11527.378 MiB/sAVGMethod: DUMB Elapsed: 0.00294 MiB: 32.00000 Copy: 10891.392 MiB/s0 Method: MCBLOCK Elapsed: 0.00585 MiB: 32.00000 Copy: 5465.414 MiB/s1 Method: MCBLOCK Elapsed: 0.00369 MiB: 32.00000 Copy: 8674.438 MiB/s2 Method: MCBLOCK Elapsed: 0.00294 MiB: 32.00000 Copy: 10902.896 MiB/s3 Method: MCBLOCK Elapsed: 0.00284 MiB: 32.00000 Copy: 11275.546 MiB/s4 Method: MCBLOCK Elapsed: 0.00283 MiB: 32.00000 Copy: 11299.435 MiB/s5 Method: MCBLOCK Elapsed: 0.00264 MiB: 32.00000 Copy: 12107.454 MiB/s6 Method: MCBLOCK Elapsed: 0.00270 MiB: 32.00000 Copy: 11847.464 MiB/s7 Method: MCBLOCK Elapsed: 0.00283 MiB: 32.00000 Copy: 11311.417 MiB/s8 Method: MCBLOCK Elapsed: 0.00273 MiB: 32.00000 Copy: 11717.320 MiB/s9 Method: MCBLOCK Elapsed: 0.00271 MiB: 32.00000 Copy: 11808.118 MiB/sAVGMethod: MCBLOCK Elapsed: 0.00318 MiB: 32.00000 Copy: 10074.615 MiB/s

mbw测试出来的数据主要关注AVG那三行

内存分配速率越大，性能越好

centos6.8下安装mbw

git clone /raas/mbwcd mbwmake./ mbw -q -n 10 256

-q 隐藏日志

-n 测试次数

256 内存大小（单位是M）

关于性能的好坏可以参考一下文章

http://coffeechou.github.io//05/24/performance-test-tools.html

三、stream工具

测试内存性能

环境：centos6.8 64位

git clone /jeffhammond/STREAM.gitgcc -O -fopenmp -DSTREAM_ARRAY_SIZE=100000000 -DNTIME=20 stream.c -o stream

重要编译参数调节：STREAM_ARRAY_SIZE 调节array大小，设置方法100M的方法：

gcc -O -DSTREAM_ARRAY_SIZE=100000000

stream.c -o stream.100M

NTIMES 调节stream在每个kernel的运行次数，输出最好的一次。

设置7次的方法。可通过-DNTIMES=7调节

多核支持

多核情况下，通过 -O -fopenmp 增加多核OpenMP支持

完整示例：gcc -O -fopenmp -DSTREAM_ARRAY_SIZE=100000000

-DNTIME=20 stream.c -o stream

[root@vm192-168-80-2 STREAM]# lsHISTORY.txt LICENSE.txt Makefile mysecond.c README stream stream.c stream.f[root@vm192-168-80-2 STREAM]# ./stream -------------------------------------------------------------STREAM version $Revision: 5.10 $-------------------------------------------------------------This system uses 8 bytes per array element.-------------------------------------------------------------Array size = 100000000 (elements), Offset = 0 (elements)Memory per array = 762.9 MiB (= 0.7 GiB).Total memory required = 2288.8 MiB (= 2.2 GiB).Each kernel will be executed 10 times.The *best* time for each kernel (excluding the first iteration)will be used to compute the reported bandwidth.-------------------------------------------------------------Number of Threads requested = 8Number of Threads counted = 8-------------------------------------------------------------Your clock granularity/precision appears to be 1 microseconds.Each test below will take on the order of 30910 microseconds.(= 30910 clock ticks)Increase the size of the arrays if this shows thatyou are not getting at least 20 clock ticks per test.-------------------------------------------------------------WARNING -- The above is only a rough guideline.For best results, please be sure you know theprecision of your system timer.-------------------------------------------------------------Function Best Rate MB/s Avg timeMin timeMax timeCopy: 36977.80.0448840.0432690.052954Scale:36797.80.0440870.0434810.044937Add: 41868.70.0584320.0573220.060968Triad:42085.30.0585500.0570270.060215-------------------------------------------------------------Solution Validates: avg error less than 1.000000e-13 on all three arrays-------------------------------------------------------------

Copy操作：它先访问一个内存单元读出其中的值,

再将值写入到另一个内存单元

Scale操作：先从内存单元读出其中的值，作一个乘法运算，

再将结果写入到另一个内存单元

Add操作：先从内存单元读出两个值，做加法运算，

再将结果写入到另一个内存单元

Triad ：将以上三个组合起来，在本测试中表示的意思是将Copy、Scale、Add

三种操作组合起来进行测试。具体操作方式是：先从内存单元中中读两个值a、b，

对其进行乘加混合运算（a + 因子 * b ），将运算结果写入到另一个内存单元。

如果觉得《unixbench测试CPU性能工具／mbw测试内存》对你有帮助，请点赞、收藏，并留下你的观点哦！

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论