PDA

View Full Version : CephFS & Oracle database, 8k blocks, slow write



mkov87
27-Mar-2017, 12:39
I made ceph cluster on SES-4 10.2.5 with ceph-deploy
supermicro servers (nx1: mon+osd, nx3:mon+osd+mds, nx4: mon+osd+mds,sm5: osd ), netcard with 2x10Gbit ports
osd - it's hdd (600GB,1TB,2TB capacity), no ssd, no flash


cluster 35f51cdd-7a5c-4c9f-921b-8d63ed8e4da7
health HEALTH_OK
monmap e2: 3 mons at {nx1=10.117.28.209:6789/0,nx3=10.117.28.13:6789/0,nx4=10.117.28.14:6789/0}
election epoch 544, quorum 0,1,2 nx3,nx4,nx1
fsmap e126: 1/1/1 up {0=nx4=up:active}, 1 up:standby
osdmap e9552: 96 osds: 96 up, 96 in
flags sortbitwise,require_jewel_osds
pgmap v1212475: 4096 pgs, 2 pools, 21535 GB data, 5384 kobjects
43124 GB used, 67745 GB / 108 TB avail
4096 active+clean

Ceph FS mount like

10.117.28.14:6789:/ /mnt/cephfs ceph name=admin,secret=bla-lbla-bla==,noatime,_netdev 0 0

We have virtual server with SLES 12.1 and Oracle database 11.2.0.4
All looks good, but db-admins said:
DB async wait > 1000ms

I test with fio 8k block, and have problem with random write speed

random-write: (g=0): rw=randwrite, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=1
...
fio-2.13
Starting 8 processes
random-write: Laying out IO file(s) (1 file(s) / 1024MB)
random-write: Laying out IO file(s) (1 file(s) / 1024MB)
random-write: Laying out IO file(s) (1 file(s) / 1024MB)
random-write: Laying out IO file(s) (1 file(s) / 1024MB)
random-write: Laying out IO file(s) (1 file(s) / 1024MB)
random-write: Laying out IO file(s) (1 file(s) / 1024MB)
random-write: Laying out IO file(s) (1 file(s) / 1024MB)
random-write: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 8 (f=8): [w(8)] [6.8% done] [0KB/4304KB/0KB /s] [0/538/0 iops] [eta 01h:08m:43s]
random-write: (groupid=0, jobs=1): err= 0: pid=468113: Mon Mar 27 10:56:33 2017
write: io=72600KB, bw=247751B/s, iops=30, runt=300069msec
slat (usec): min=10, max=91, avg=19.85, stdev= 6.16
clat (msec): min=2, max=936, avg=33.04, stdev=64.11
lat (msec): min=2, max=936, avg=33.06, stdev=64.11
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 6], 10.00th=[ 7], 20.00th=[ 9],
| 30.00th=[ 10], 40.00th=[ 11], 50.00th=[ 12], 60.00th=[ 13],
| 70.00th=[ 18], 80.00th=[ 28], 90.00th=[ 81], 95.00th=[ 169],
| 99.00th=[ 330], 99.50th=[ 392], 99.90th=[ 529], 99.95th=[ 668],
| 99.99th=[ 938]
bw (KB /s): min= 16, max= 718, per=12.53%, avg=243.82, stdev=153.94
lat (msec) : 4=0.71%, 10=36.42%, 20=36.15%, 50=13.79%, 100=4.25%
lat (msec) : 250=6.31%, 500=2.20%, 750=0.14%, 1000=0.02%
cpu : usr=0.04%, sys=0.07%, ctx=9089, majf=0, minf=9
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=9075/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-write: (groupid=0, jobs=1): err= 0: pid=468114: Mon Mar 27 10:56:33 2017
write: io=75968KB, bw=259215B/s, iops=31, runt=300102msec
slat (usec): min=10, max=133, avg=19.86, stdev= 6.43
clat (msec): min=1, max=1087, avg=31.58, stdev=63.13
lat (msec): min=1, max=1087, avg=31.60, stdev=63.13
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 6], 10.00th=[ 7], 20.00th=[ 9],
| 30.00th=[ 10], 40.00th=[ 11], 50.00th=[ 12], 60.00th=[ 13],
| 70.00th=[ 18], 80.00th=[ 28], 90.00th=[ 74], 95.00th=[ 157],
| 99.00th=[ 306], 99.50th=[ 388], 99.90th=[ 652], 99.95th=[ 717],
| 99.99th=[ 1090]
bw (KB /s): min= 16, max= 759, per=13.24%, avg=257.70, stdev=163.21
lat (msec) : 2=0.01%, 4=0.76%, 10=37.28%, 20=36.15%, 50=13.63%
lat (msec) : 100=3.99%, 250=6.38%, 500=1.60%, 750=0.17%, 1000=0.02%
lat (msec) : 2000=0.01%
cpu : usr=0.03%, sys=0.08%, ctx=9503, majf=0, minf=9
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=9496/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-write: (groupid=0, jobs=1): err= 0: pid=468115: Mon Mar 27 10:56:33 2017
write: io=71984KB, bw=245621B/s, iops=29, runt=300103msec
slat (usec): min=10, max=160, avg=19.81, stdev= 6.36
clat (msec): min=3, max=961, avg=33.33, stdev=65.31
lat (msec): min=3, max=961, avg=33.35, stdev=65.31
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 6], 10.00th=[ 7], 20.00th=[ 9],
| 30.00th=[ 10], 40.00th=[ 11], 50.00th=[ 12], 60.00th=[ 13],
| 70.00th=[ 19], 80.00th=[ 29], 90.00th=[ 84], 95.00th=[ 165],
| 99.00th=[ 330], 99.50th=[ 408], 99.90th=[ 619], 99.95th=[ 725],
| 99.99th=[ 963]
bw (KB /s): min= 15, max= 720, per=12.48%, avg=242.93, stdev=156.65
lat (msec) : 4=0.43%, 10=35.69%, 20=36.14%, 50=14.35%, 100=4.77%
lat (msec) : 250=6.55%, 500=1.87%, 750=0.17%, 1000=0.04%
cpu : usr=0.04%, sys=0.07%, ctx=9008, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=8998/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-write: (groupid=0, jobs=1): err= 0: pid=468116: Mon Mar 27 10:56:33 2017
write: io=76976KB, bw=262655B/s, iops=32, runt=300102msec
slat (usec): min=10, max=98, avg=19.89, stdev= 6.36
clat (msec): min=2, max=1002, avg=31.16, stdev=61.88
lat (msec): min=2, max=1002, avg=31.18, stdev=61.88
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 6], 10.00th=[ 7], 20.00th=[ 9],
| 30.00th=[ 10], 40.00th=[ 11], 50.00th=[ 12], 60.00th=[ 13],
| 70.00th=[ 18], 80.00th=[ 27], 90.00th=[ 69], 95.00th=[ 159],
| 99.00th=[ 310], 99.50th=[ 388], 99.90th=[ 570], 99.95th=[ 676],
| 99.99th=[ 1004]
bw (KB /s): min= 16, max= 720, per=13.28%, avg=258.50, stdev=165.14
lat (msec) : 4=0.50%, 10=38.10%, 20=35.17%, 50=14.11%, 100=4.62%
lat (msec) : 250=5.55%, 500=1.78%, 750=0.14%, 1000=0.02%, 2000=0.01%
cpu : usr=0.04%, sys=0.07%, ctx=9626, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=9622/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-write: (groupid=0, jobs=1): err= 0: pid=468117: Mon Mar 27 10:56:33 2017
write: io=74000KB, bw=252500B/s, iops=30, runt=300102msec
slat (usec): min=10, max=87, avg=20.14, stdev= 6.60
clat (msec): min=2, max=990, avg=32.42, stdev=64.20
lat (msec): min=2, max=990, avg=32.44, stdev=64.20
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 6], 10.00th=[ 7], 20.00th=[ 9],
| 30.00th=[ 10], 40.00th=[ 11], 50.00th=[ 12], 60.00th=[ 13],
| 70.00th=[ 18], 80.00th=[ 28], 90.00th=[ 76], 95.00th=[ 165],
| 99.00th=[ 326], 99.50th=[ 400], 99.90th=[ 594], 99.95th=[ 644],
| 99.99th=[ 988]
bw (KB /s): min= 16, max= 688, per=12.77%, avg=248.47, stdev=159.62
lat (msec) : 4=0.59%, 10=36.74%, 20=36.21%, 50=13.94%, 100=4.17%
lat (msec) : 250=6.34%, 500=1.77%, 750=0.23%, 1000=0.02%
cpu : usr=0.04%, sys=0.07%, ctx=9258, majf=0, minf=12
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=9250/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-write: (groupid=0, jobs=1): err= 0: pid=468118: Mon Mar 27 10:56:33 2017
write: io=72232KB, bw=246495B/s, iops=30, runt=300069msec
slat (usec): min=9, max=116, avg=20.42, stdev= 6.69
clat (msec): min=2, max=1217, avg=33.21, stdev=63.28
lat (msec): min=2, max=1217, avg=33.23, stdev=63.28
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 6], 10.00th=[ 7], 20.00th=[ 9],
| 30.00th=[ 10], 40.00th=[ 11], 50.00th=[ 12], 60.00th=[ 14],
| 70.00th=[ 19], 80.00th=[ 31], 90.00th=[ 83], 95.00th=[ 169],
| 99.00th=[ 322], 99.50th=[ 388], 99.90th=[ 553], 99.95th=[ 594],
| 99.99th=[ 1221]
bw (KB /s): min= 15, max= 752, per=12.46%, avg=242.55, stdev=153.01
lat (msec) : 4=0.64%, 10=36.21%, 20=35.22%, 50=14.54%, 100=4.56%
lat (msec) : 250=6.82%, 500=1.86%, 750=0.13%, 2000=0.01%
cpu : usr=0.02%, sys=0.08%, ctx=9040, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=9029/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-write: (groupid=0, jobs=1): err= 0: pid=468119: Mon Mar 27 10:56:33 2017
write: io=71344KB, bw=243437B/s, iops=29, runt=300103msec
slat (usec): min=9, max=129, avg=19.82, stdev= 6.31
clat (msec): min=2, max=898, avg=33.63, stdev=66.03
lat (msec): min=2, max=898, avg=33.65, stdev=66.03
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 6], 10.00th=[ 7], 20.00th=[ 9],
| 30.00th=[ 10], 40.00th=[ 11], 50.00th=[ 12], 60.00th=[ 13],
| 70.00th=[ 18], 80.00th=[ 29], 90.00th=[ 82], 95.00th=[ 174],
| 99.00th=[ 334], 99.50th=[ 404], 99.90th=[ 570], 99.95th=[ 627],
| 99.99th=[ 898]
bw (KB /s): min= 16, max= 798, per=12.33%, avg=240.00, stdev=160.43
lat (msec) : 4=0.58%, 10=36.82%, 20=35.71%, 50=13.58%, 100=4.54%
lat (msec) : 250=6.31%, 500=2.20%, 750=0.24%, 1000=0.01%
cpu : usr=0.03%, sys=0.08%, ctx=8929, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=8918/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-write: (groupid=0, jobs=1): err= 0: pid=468120: Mon Mar 27 10:56:33 2017
write: io=69112KB, bw=235822B/s, iops=28, runt=300102msec
slat (usec): min=10, max=175, avg=20.09, stdev= 6.68
clat (msec): min=2, max=1280, avg=34.71, stdev=68.95
lat (msec): min=2, max=1280, avg=34.73, stdev=68.95
clat percentiles (msec):
| 1.00th=[ 5], 5.00th=[ 6], 10.00th=[ 7], 20.00th=[ 9],
| 30.00th=[ 10], 40.00th=[ 11], 50.00th=[ 12], 60.00th=[ 14],
| 70.00th=[ 19], 80.00th=[ 30], 90.00th=[ 86], 95.00th=[ 176],
| 99.00th=[ 343], 99.50th=[ 420], 99.90th=[ 603], 99.95th=[ 685],
| 99.99th=[ 1287]
bw (KB /s): min= 16, max= 752, per=11.94%, avg=232.45, stdev=152.69
lat (msec) : 4=0.57%, 10=35.06%, 20=36.36%, 50=14.37%, 100=4.65%
lat (msec) : 250=6.44%, 500=2.30%, 750=0.22%, 1000=0.01%, 2000=0.02%
cpu : usr=0.03%, sys=0.07%, ctx=8643, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=8639/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
WRITE: io=584216KB, aggrb=1946KB/s, minb=230KB/s, maxb=256KB/s, mint=300069msec, maxt=300103msec


but read looks good



random-read: (g=0): rw=randread, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=1
...
fio-2.13
Starting 8 processes
random-read: Laying out IO file(s) (1 file(s) / 1024MB)
random-read: Laying out IO file(s) (1 file(s) / 1024MB)
random-read: Laying out IO file(s) (1 file(s) / 1024MB)
random-read: Laying out IO file(s) (1 file(s) / 1024MB)
random-read: Laying out IO file(s) (1 file(s) / 1024MB)
random-read: Laying out IO file(s) (1 file(s) / 1024MB)
random-read: Laying out IO file(s) (1 file(s) / 1024MB)
random-read: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 8 (f=8): [r(8)] [100.0% done] [204.8MB/0KB/0KB /s] [26.3K/0/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=466737: Mon Mar 27 10:47:33 2017
read : io=6633.4MB, bw=22642KB/s, iops=2830, runt=300001msec
slat (usec): min=6, max=393, avg=13.01, stdev= 3.57
clat (usec): min=103, max=285454, avg=338.18, stdev=1401.74
lat (usec): min=112, max=285464, avg=351.36, stdev=1401.78
clat percentiles (usec):
| 1.00th=[ 171], 5.00th=[ 183], 10.00th=[ 191], 20.00th=[ 201],
| 30.00th=[ 219], 40.00th=[ 258], 50.00th=[ 278], 60.00th=[ 298],
| 70.00th=[ 318], 80.00th=[ 342], 90.00th=[ 398], 95.00th=[ 462],
| 99.00th=[ 852], 99.50th=[ 1896], 99.90th=[12480], 99.95th=[17280],
| 99.99th=[62208]
bw (KB /s): min= 3232, max=30272, per=12.43%, avg=22633.21, stdev=4542.55
lat (usec) : 250=37.77%, 500=58.50%, 750=2.52%, 1000=0.40%
lat (msec) : 2=0.32%, 4=0.16%, 10=0.17%, 20=0.12%, 50=0.02%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=0.99%, sys=4.27%, ctx=849418, majf=0, minf=104
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=849071/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-read: (groupid=0, jobs=1): err= 0: pid=466738: Mon Mar 27 10:47:33 2017
read : io=6700.5MB, bw=22871KB/s, iops=2858, runt=300001msec
slat (usec): min=6, max=567, avg=13.00, stdev= 3.61
clat (usec): min=107, max=234255, avg=334.65, stdev=1323.11
lat (usec): min=116, max=234275, avg=347.83, stdev=1323.16
clat percentiles (usec):
| 1.00th=[ 171], 5.00th=[ 183], 10.00th=[ 189], 20.00th=[ 201],
| 30.00th=[ 219], 40.00th=[ 258], 50.00th=[ 278], 60.00th=[ 294],
| 70.00th=[ 314], 80.00th=[ 342], 90.00th=[ 390], 95.00th=[ 446],
| 99.00th=[ 852], 99.50th=[ 1880], 99.90th=[12608], 99.95th=[17536],
| 99.99th=[52992]
bw (KB /s): min= 4192, max=29360, per=12.56%, avg=22862.49, stdev=4520.73
lat (usec) : 250=37.87%, 500=58.99%, 750=1.94%, 1000=0.38%
lat (msec) : 2=0.33%, 4=0.16%, 10=0.17%, 20=0.12%, 50=0.03%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=1.03%, sys=4.29%, ctx=858044, majf=0, minf=54
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=857656/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-read: (groupid=0, jobs=1): err= 0: pid=466739: Mon Mar 27 10:47:33 2017
read : io=6775.7MB, bw=23127KB/s, iops=2890, runt=300001msec
slat (usec): min=6, max=1190, avg=13.03, stdev= 3.77
clat (usec): min=108, max=292723, avg=330.74, stdev=1446.30
lat (usec): min=117, max=292737, avg=343.95, stdev=1446.33
clat percentiles (usec):
| 1.00th=[ 169], 5.00th=[ 181], 10.00th=[ 189], 20.00th=[ 199],
| 30.00th=[ 211], 40.00th=[ 239], 50.00th=[ 270], 60.00th=[ 290],
| 70.00th=[ 310], 80.00th=[ 338], 90.00th=[ 390], 95.00th=[ 454],
| 99.00th=[ 836], 99.50th=[ 1816], 99.90th=[12480], 99.95th=[18560],
| 99.99th=[59136]
bw (KB /s): min= 5456, max=30304, per=12.70%, avg=23120.34, stdev=4770.42
lat (usec) : 250=42.88%, 500=53.73%, 750=2.22%, 1000=0.38%
lat (msec) : 2=0.32%, 4=0.16%, 10=0.16%, 20=0.11%, 50=0.03%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=0.98%, sys=4.40%, ctx=867625, majf=0, minf=42
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=867281/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-read: (groupid=0, jobs=1): err= 0: pid=466740: Mon Mar 27 10:47:33 2017
read : io=6585.8MB, bw=22479KB/s, iops=2809, runt=300001msec
slat (usec): min=6, max=529, avg=12.97, stdev= 3.69
clat (usec): min=103, max=416495, avg=340.76, stdev=1541.99
lat (usec): min=111, max=416509, avg=353.91, stdev=1542.02
clat percentiles (usec):
| 1.00th=[ 171], 5.00th=[ 183], 10.00th=[ 189], 20.00th=[ 201],
| 30.00th=[ 221], 40.00th=[ 258], 50.00th=[ 278], 60.00th=[ 298],
| 70.00th=[ 318], 80.00th=[ 346], 90.00th=[ 406], 95.00th=[ 482],
| 99.00th=[ 860], 99.50th=[ 1992], 99.90th=[12096], 99.95th=[16768],
| 99.99th=[51456]
bw (KB /s): min= 2944, max=28368, per=12.34%, avg=22470.13, stdev=4666.43
lat (usec) : 250=37.39%, 500=58.25%, 750=3.13%, 1000=0.40%
lat (msec) : 2=0.33%, 4=0.16%, 10=0.18%, 20=0.12%, 50=0.03%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=1.07%, sys=4.15%, ctx=843360, majf=0, minf=39
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=842969/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-read: (groupid=0, jobs=1): err= 0: pid=466741: Mon Mar 27 10:47:33 2017
read : io=6669.4MB, bw=22765KB/s, iops=2845, runt=300001msec
slat (usec): min=6, max=573, avg=12.99, stdev= 3.62
clat (usec): min=111, max=361292, avg=336.28, stdev=1537.57
lat (usec): min=120, max=361301, avg=349.46, stdev=1537.60
clat percentiles (usec):
| 1.00th=[ 169], 5.00th=[ 183], 10.00th=[ 189], 20.00th=[ 199],
| 30.00th=[ 213], 40.00th=[ 245], 50.00th=[ 270], 60.00th=[ 290],
| 70.00th=[ 314], 80.00th=[ 338], 90.00th=[ 394], 95.00th=[ 462],
| 99.00th=[ 852], 99.50th=[ 1992], 99.90th=[12736], 99.95th=[17792],
| 99.99th=[60160]
bw (KB /s): min= 3936, max=29776, per=12.50%, avg=22757.01, stdev=4739.17
lat (usec) : 250=41.15%, 500=55.05%, 750=2.59%, 1000=0.39%
lat (msec) : 2=0.32%, 4=0.16%, 10=0.17%, 20=0.12%, 50=0.03%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=1.07%, sys=4.22%, ctx=854048, majf=0, minf=70
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=853677/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-read: (groupid=0, jobs=1): err= 0: pid=466742: Mon Mar 27 10:47:33 2017
read : io=6761.0MB, bw=23077KB/s, iops=2884, runt=300001msec
slat (usec): min=6, max=405, avg=12.89, stdev= 3.52
clat (usec): min=103, max=374092, avg=331.64, stdev=1584.80
lat (usec): min=112, max=374105, avg=344.70, stdev=1584.84
clat percentiles (usec):
| 1.00th=[ 169], 5.00th=[ 181], 10.00th=[ 187], 20.00th=[ 197],
| 30.00th=[ 207], 40.00th=[ 229], 50.00th=[ 266], 60.00th=[ 286],
| 70.00th=[ 310], 80.00th=[ 338], 90.00th=[ 390], 95.00th=[ 462],
| 99.00th=[ 820], 99.50th=[ 1848], 99.90th=[12480], 99.95th=[17792],
| 99.99th=[63744]
bw (KB /s): min= 6032, max=30640, per=12.67%, avg=23069.45, stdev=4839.61
lat (usec) : 250=44.92%, 500=51.36%, 750=2.58%, 1000=0.37%
lat (msec) : 2=0.29%, 4=0.16%, 10=0.17%, 20=0.11%, 50=0.03%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=1.01%, sys=4.30%, ctx=865719, majf=0, minf=21
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=865408/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-read: (groupid=0, jobs=1): err= 0: pid=466743: Mon Mar 27 10:47:33 2017
read : io=6659.4MB, bw=22731KB/s, iops=2841, runt=300001msec
slat (usec): min=7, max=624, avg=13.09, stdev= 3.65
clat (usec): min=110, max=406051, avg=336.72, stdev=1549.74
lat (usec): min=119, max=406064, avg=349.98, stdev=1549.78
clat percentiles (usec):
| 1.00th=[ 169], 5.00th=[ 183], 10.00th=[ 189], 20.00th=[ 199],
| 30.00th=[ 213], 40.00th=[ 247], 50.00th=[ 274], 60.00th=[ 294],
| 70.00th=[ 314], 80.00th=[ 342], 90.00th=[ 394], 95.00th=[ 462],
| 99.00th=[ 852], 99.50th=[ 1992], 99.90th=[12480], 99.95th=[17536],
| 99.99th=[59648]
bw (KB /s): min= 2592, max=30064, per=12.48%, avg=22721.81, stdev=4705.44
lat (usec) : 250=40.61%, 500=55.66%, 750=2.52%, 1000=0.40%
lat (msec) : 2=0.32%, 4=0.16%, 10=0.18%, 20=0.12%, 50=0.03%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=0.99%, sys=4.31%, ctx=852733, majf=0, minf=104
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=852398/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
random-read: (groupid=0, jobs=1): err= 0: pid=466744: Mon Mar 27 10:47:33 2017
read : io=6553.3MB, bw=22368KB/s, iops=2796, runt=300001msec
slat (usec): min=6, max=535, avg=13.05, stdev= 3.59
clat (usec): min=3, max=426544, avg=342.44, stdev=1597.32
lat (usec): min=110, max=426555, avg=355.67, stdev=1597.35
clat percentiles (usec):
| 1.00th=[ 171], 5.00th=[ 183], 10.00th=[ 191], 20.00th=[ 201],
| 30.00th=[ 219], 40.00th=[ 258], 50.00th=[ 278], 60.00th=[ 298],
| 70.00th=[ 318], 80.00th=[ 346], 90.00th=[ 402], 95.00th=[ 478],
| 99.00th=[ 852], 99.50th=[ 2024], 99.90th=[12480], 99.95th=[17792],
| 99.99th=[57088]
bw (KB /s): min= 2736, max=28944, per=12.29%, avg=22366.13, stdev=4610.85
lat (usec) : 4=0.01%, 100=0.01%, 250=37.41%, 500=58.41%, 750=2.97%
lat (usec) : 1000=0.39%
lat (msec) : 2=0.32%, 4=0.16%, 10=0.18%, 20=0.12%, 50=0.03%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=1.05%, sys=4.17%, ctx=839148, majf=0, minf=64
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=838811/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: io=53338MB, aggrb=182059KB/s, minb=22368KB/s, maxb=23127KB/s, mint=300001msec, maxt=300001msec

ab
27-Mar-2017, 12:55
While I'm not sure it's the root cause of the issue, I would probably not
use this kind of storage for an oft-written relational database. In order
to guarantee the system is as reliable as claimed (by default three
replicas of all data in a pool) every write confirms that all replicas
have, at least in their transaction log, the write committed before
letting the client move on to do something else. This means that you're
waiting for three writes, not just one, and across multiple OSDs
(probably/preferably multiple machines, if not multiple racks/datacenters)
in order for the redundancy to be complete. If you have something that
writes a lot of little things, you're going to be penalized in overhead.

Ceph is great for things with a lot of big objects, especially static
ones, because the write penalty is for the object, not for a block on a
disk, so one big object may have one bit of overhead instead of a million
blocks to make up on object each incurring their own bi of overhead.

None of that is meant to claim that Ceph cannot do things better, or
cannot be used for a lot of little writes, but you're not going to be able
to get the same performance because of the redundancy guaranteed as you
would from a system that is tuned for fast writes or block-based access.

--
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...

polezhaevdmi
28-Mar-2017, 11:38
The explanation, why Ceph 'out-of-the-box' requires tuning almost per each paticular case, is in the Sébastien Han's presentation (https://www.sebastien-han.fr/blog/2014/02/27/cephdays-frankfurt-ceph-performance/). However, the SES cluster performance might be tuned to better. As for me, I see these approaches to try:

1. Tune OS and SES parameters for small random operations. Link 1 (https://forum.proxmox.com/threads/ceph-performance-and-latency.18552/), Link 2 (https://www.redhat.com/cms/managed-files/st-ceph-storage-mysql-refarch-technology-detail-inc0448222-201609-en.pdf), Link 3 (http://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments).

2. Switch to experimental BlueStore backend. That might give 40% IOPS gain on the same hardware. Link 1 (https://www.slideshare.net/sageweil1/bluestore-a-new-faster-storage-backend-for-ceph), Link 2 (http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008271.html), Link 3 (https://blog.widodh.nl/2017/01/testing-ceph-bluestore-with-the-kraken-release/).

mkov871
03-Apr-2017, 06:29
I made 2 pools for CepFS : ceph_data (pg num 2048, size 2), ceph_metadata (pg num 2048, size 2)
I wrote ~7 TB
I see the use that Openattic showed:
ceph_data 25%
ceph_metadata 0.0%

I have idea to re-create pools, changing pg nums to another percentage.
ceph_data 4096
ceph_metadata 1024

Is it good idea to make a metadata pool smaller?

thanks

polezhaevdmi
11-Apr-2017, 16:06
ceph_data 4096
ceph_metadata 1024
Is it good idea to make a metadata pool smaller?


Yes, the metadata pool usually has small amount of data stored. I'm suspecting it will be OK to start with 32 or 64 PGs for ceph_metadata, and double the PG number each case the pool utilization will be high (or will be proposed by ' ceph -s '). The PG number increment can be done on the fly.