Wednesday, July 17, 2013

SSD Caching 2: One step forward...

Overview

After several days of using the my dm-cache setup (see: Adding an SSD cache to an existing encrypted filesystem with dm-cache) to cache my encrypted /home filesystem I have noticed that very little was being cached. Based on my interpretations of the block statistics for the 3 block devices involved the cache was saving me a huge number of writes to disk. Still, I hadn’t noticed a significant perceptable performance improvement. Caching just /home may not be enough to change my perception of my workstation as being held back by a slow drive. However, the real problem is that I have no baseline to compare with. It’s time to remove the cache and set up some stats collection.

Block device review

/dev/mapper/benwayvg-ehome         luks block device on an LVM partition
/dev/mapper/cehome                 cache device from dmsetup
/dev/mapper/ehome                  opened luks device (an ext4 filesystem)

Save the Stats

The cumulative block stats for the three devices are still illustrative. This is after 4 days of uptime: cat /sys/block/dm-{17,18,19}

device                       rd i/o  rmerges rsectors   rticks   wr i/o  wmerges wsectors   wticks inflight io_ticks  time_in_queue
/dev/mapper/benwayvg-ehome   120223        0  4702124   651290   127889        0 20907992  3195852        0   644128  3847142
/dev/mapper/cehome           133223        0  4518728   595193   578275        0 32206792  5623591        0   896061  6218948
/dev/mapper/ehome            133137        0  4517804   604400   550712        0 32206792 12038764        0   911568 12643178

I think some general observations can be made here. I didn’t notice much of a performance change after adding the cache device but it’s clear it saved a big percentage of writes to disk. The writes to the luks device were 550712, but luks has some overhead it looks like and the writes to the cache device were 578275. However the cache device only wrote to disk 127889 times!

There wasn’t much savings of disk read i/o however, but perhaps some tuning would have changed that. I plan to revisit it all after establish some baseline stats over the next week.

The output of dmsetup status cehome also revealed only 1471 blocks in the cache. Blocks in this case are 256k each, which is only 350 MiB or so. I guess my expectation was more would be cached. The documentation for dm-cache (device-mapper/cache.txt in your kernel documentation directory) explains all fields in the output from dmsetup status and for illustration I’ll show the full output from dmsetup status:

0 209715200 cache 376/524288 19279 877645 583982 265753 0 662 1471 1438 0 2 migration_threshold 204800 4 random_threshold 4 sequential_threshold 2048

Perhaps dm-cache is simply doing it’s job and not caching blocks that would be replaced too quickly. Maybe it’s simply that my /home partition is a good candidate for caching, but simply doesn’t require very much cache to satisfy it’s needs.

Prepare for removal

First I have to umount /home and to do that I had to kill all user processes and any other processes that might have files open on /home. Be sure to save your work and log in as root if you’re attempting something similar.

pkill -u coxa ; umount /home

Now we should close the encrypted Luks device: cryptsetup luksClose ehome

Removing the Cache

Basically I just need to remove the cache, and to do that we need to use the cleaner policy which forces everything back to disk.

[root@benway ~]# dmsetup table cehome
0 209715200 cache 8:17 8:18 253:17 512 1 writeback default 0
[root@benway ~]# dmsetup status cehome
0 209715200 cache 376/524288 19279 877645 583982 265753 0 662 1471 1438 0 2 migration_threshold 204800 4 random_threshold 4 sequential_threshold 2048
[root@benway ~]# dmsetup suspend cehome
[root@benway ~]# dmsetup reload cehome --table '0 209715200 cache 8:17 8:18 253:17 512 1 writeback cleaner 0'
[root@benway ~]# dmsetup resume cehome
[root@benway ~]# dmsetup status cehome
0 209715200 cache 376/524288 19279 877657 583982 265753 0 0 1471 1009 0 2 migration_threshold 204800 0
[root@benway ~]# dmsetup status cehome
0 209715200 cache 376/524288 19279 877657 583982 265753 0 0 1471 530 0 2 migration_threshold 204800 0
[root@benway ~]# dmsetup status cehome
0 209715200 cache 376/524288 19279 877657 583982 265753 0 0 1471 361 0 2 migration_threshold 204800 0
[root@benway ~]# dmsetup status cehome
0 209715200 cache 376/524288 19279 877657 583982 265753 0 0 1471 0 0 2 migration_threshold 204800 0
[root@benway ~]# dmsetup status cehome
0 209715200 cache 376/524288 19279 877657 583982 265753 0 0 1471 0 0 2 migration_threshold 204800 0
[root@benway ~]# dmsetup wait cehome
[root@benway ~]# dmsetup remove cehome

You can see the column showing dirty blocks dropping down to zero after resuming the device with the cleaner policy in place.

Tidying Up

I needed to modify /etc/crypttab to make sure it references the logical volume that we are no longer caching, instead of the cached block device we removed above.

#ehome  /dev/mapper/cehome
ehome   /dev/benwayvg/ehome

I was using a systemd service to create the dm-cache device, we need to make sure that is not started: rm -f /usr/lib/systemd/system/local-fs.target.wants/dmsetup-dm-cache.service /etc/systemd/system/dmsetup-dm-cache.service

Next steps

The next step is to start gathering some detailed statistics about the disk io on my system. For that I am going to try graphite/carbon or collectd. My next post will include some setup notes from that process.

No comments:

Post a Comment