Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage Performance (Status and improvements) #306

Open
1 of 14 tasks
Wescoeur opened this issue Nov 8, 2019 · 2 comments
Open
1 of 14 tasks

Storage Performance (Status and improvements) #306

Wescoeur opened this issue Nov 8, 2019 · 2 comments
Assignees
Labels
comp:storage Storage related (SMAPI) kernel or drivers Related to kernel or drivers

Comments

@Wescoeur
Copy link
Member

Wescoeur commented Nov 8, 2019

This is a a non-exhaustive list of ideas and potential improvements to test concerning the storage layer:

  • Indirect segments are not supported by blktap (see: [RFE] Support indirect descriptors feature xapi-project/blktap#287), is it a good idea to implement this feature in tapdisk and/or qemu-dp?
  • qemu-dp uses blkif in the same way as tapdisk but we have a better performance with tapdisk. Why? I think there is a specific bottleneck in the qemu-dp process (RW API maybe).
  • Modify the ring-page-order param in blkif.
    => No better performance with NVMe (see: https://xcp-ng.org/forum/topic/1596/solved-slow-disk-write-speed-inside-vm/21).
    => Update: The param is now available on the master branch of tapdisk. (We must test it!)
    => Update2: The tapdisk/module param doesn't support a value of 4 for the moment.
  • Test qemu-dp with io_uring support (see: https://patchew.org/QEMU/[email protected]/).
  • Try to use persistant grant (with LRU cache or LIFO queue) + try to build a similar map/unmap mechanism with deferred unmap. Is it really true that map/unmap is always slow compared to buffer copy between guest and host?
  • Use two rings: one for the requests and another for responses (bad usage of the cache line).
  • Avoid context switches, IRQs (polling is good when there are a lot of data)... A solution to test is to use a process with N threads on N physical cores. Each process thread must map a VM thread (1:1 map) which executes requests and responses (see: https://github.com/torvalds/linux/blob/master/include/xen/interface/io/blkif.h#L32)
  • Test the differents Linux I/O schedulers at different levels.
  • Reproduce ext4 EIO problem with io_uring and O_DIRECT.
  • Use ASan in xcp-ng-async-io because Valgrind cannot be used for the moment.
  • Make xcp-ng-async-io RPM + update blktap RPM.
  • Try to use a more recent version of qemu-dp.
  • Try to patch qemu-dp with xcp-ng-async-io.
  • Bench with XFS! (When io_uring will be stable, i.e. no EIO errors with ext4.)
@Wescoeur Wescoeur added kernel or drivers Related to kernel or drivers comp:storage Storage related (SMAPI) labels Nov 8, 2019
@Wescoeur Wescoeur self-assigned this Nov 8, 2019
@maxcuttins
Copy link

Uhuuu.. this is something I would like to follow ❤️

@Wescoeur
Copy link
Member Author

Wescoeur commented Nov 12, 2019

Pooling REQs: Benchs with tapdisk, NVMe disk and a AMD Ryzen 7 2700

Idea: Bench performance with no-pooling, full-pooling or adaptive pooling (See: https://github.com/xapi-project/blktap/blob/d2f49df6580a891a1317398b9e63f87ea0189571/drivers/td-blkif.c#L352)

Results

ioping
sequential
random

Analysis

When pooling is always enabled, the cpu usage is around 100% (seems logic) and we don't have a significant gain, without polling the random write is negatively impacted. So the default process is the best.

Concerning qemu-dp we can try to use a similar adaptive pooling to increase (a little) the random write performance. I'm not sure if this mechanism is currently implemented but for the moment it's not the most important thing to investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:storage Storage related (SMAPI) kernel or drivers Related to kernel or drivers
Projects
None yet
Development

No branches or pull requests

2 participants