Skip to content

Conversation

puranjaymohan
Copy link
Contributor

vm_area_map_pages() may allocate memory while inserting pages into bpf arena's vm_area. In order to make bpf_arena_alloc_pages() kfunc non-sleepable change bpf arena to populate pages without allocating memory:

  • at arena creation time populate all page table levels except the last level
  • when new pages need to be inserted call apply_to_page_range() again which will only set_pte_at() those pages and will not allocate memory
  • when freeing pages, the work is deferred and apply_to_existing_page_range() is used to reset the pte entry and free the page. This way the intermediate page table levels are not freed until the arena is destroyed.

Co-developed-by: Alexei Starovoitov [email protected]

@kernel-patches-daemon-bpf kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch 8 times, most recently from 6d36951 to 6116807 Compare October 19, 2025 02:33
vm_area_map_pages() may allocate memory while inserting pages into bpf
arena's vm_area. In order to make bpf_arena_alloc_pages() kfunc
non-sleepable change bpf arena to populate pages without
allocating memory:
- at arena creation time populate all page table levels except
  the last level
- when new pages need to be inserted call apply_to_page_range() again
  with apply_range_set_cb() which will only set_pte_at() those pages and
  will not allocate memory.
- when freeing pages call apply_to_existing_page_range with
  apply_range_clear_cb() to clear the pte for the page to be removed. This
  doesn't free intermediate page table levels.

Signed-off-by: Puranjay Mohan <[email protected]>
Make arena_free_pages() any context safe by deferring the main logic to a
workqueue. This is done by calling irq_work_queue() which in turn calls
schedule_work(). We can't directly call schedule_work() from
arena_free_pages() as it is not safe.

arena_free_pages() queues the address and the page count to be freed using
a lock-less list of struct arena_free_spans. The arena_free_worker() (the
work queue handler) iterates this list and handles the queued free
requests.  An instance of struct arena_free_span is allocated with
kmalloc_nolock() in arena_free_pages(), but as kmalloc_nolock() can fail, a
percpu pool of these structs is created at arena allocation time and this
pool is used in case kmalloc_nolock() fails. The arena_free_worker()
returns the spans back to the percpu pool of the cpu where they came from.

Signed-off-by: Puranjay Mohan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant