Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Hack to optimise based on known sizes #276

Draft
wants to merge 8 commits into
base: snmalloc1
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,19 @@ if(CONST_QUALIFIED_MALLOC_USABLE_SIZE)
endif()


# Build a redirection layer for all sizes that are a multiple of
# 16bytes up to 1024.
add_executable(generate src/redirect/generate.cc)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to differentiate between executables for the build host and executables for the target when we're cross-compiling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needing to build a C++ program on the host to generate things that we compile for the target is a bit painful for cross compiling (do we even guarantee that the sizes will be the same if, for example, we're compiling on a 32-bit system for a CHERI target?).

It's also going to be annoying to integrate into a libc build system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all platforms and configurations, we build would use the same file. However, we can use this program to build a collection of headers that encode the parameters of interest. E.g. of the form

generated_[MIN_ALLOC_BITS]_[INTERMEDIATE_BITS].cc

Currently, everything would use exactly the same thing:

generated_4_2.cc

But for CHERI, we might want to up the minimum allocation size. Getting either parameter wrong in the header would "work", however, we might

  • have more entry points than necessary, or
  • allocate a larger object than necessary.

Perhaps, just check in the file in once, we have finished experimenting. While, we are experimenting, I think having this generated is good as it means we are less likely to make mistakes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, we use the template parameter for s, so the code will be specialised for this size, even if it ends up being a medium alloc, or large.

#define DEFINE_MALLOC_SIZE(a, s) \
  extern "C" void* a() \
  { \
    return snmalloc::ThreadAlloc::get_noncachable()->template alloc<s>(); \
  }

target_link_libraries(generate snmalloc_lib)
add_custom_target(generated ALL
COMMAND generate ${CMAKE_CURRENT_BINARY_DIR}/generated.cc
BYPRODUCTS ${CMAKE_CURRENT_BINARY_DIR}/generated.cc
)
add_library(redirect_small STATIC src/redirect/redirect.cc)
target_link_libraries(redirect_small snmalloc_lib)
target_include_directories(redirect_small PRIVATE ${CMAKE_CURRENT_BINARY_DIR})
add_dependencies(redirect_small generated)

# To build with just the header library target define SNMALLOC_ONLY_HEADER_LIBRARY
# in containing Cmake file.
if(NOT DEFINED SNMALLOC_ONLY_HEADER_LIBRARY)
Expand Down
24 changes: 23 additions & 1 deletion src/mem/alloc.h
Original file line number Diff line number Diff line change
Expand Up @@ -1040,7 +1040,8 @@ namespace snmalloc
allow_reserve == NoReserve ? "noreserve" : "reserve"));

SNMALLOC_ASSUME(size <= SLAB_SIZE);
sizeclass_t sizeclass = size_to_sizeclass(size);
SNMALLOC_ASSUME(size > 0);
sizeclass_t sizeclass = size_to_sizeclass<true>(size);
return small_alloc_inner<zero_mem, allow_reserve>(sizeclass, size);
}

Expand All @@ -1066,6 +1067,13 @@ namespace snmalloc
return p;
}

return small_alloc_inner_slow<zero_mem, allow_reserve>(sizeclass, size);
}

template<ZeroMem zero_mem, AllowReserve allow_reserve>
SNMALLOC_SLOW_PATH void*
small_alloc_inner_slow(sizeclass_t sizeclass, size_t size)
{
if (likely(!has_messages()))
return small_alloc_next_free_list<zero_mem, allow_reserve>(
sizeclass, size);
Expand Down Expand Up @@ -1228,6 +1236,20 @@ namespace snmalloc
small_dealloc_offseted_inner(super, p, sizeclass);
}

static SNMALLOC_FAST_PATH bool small_local_dealloc(void* p)
{
auto super = Superslab::get(p);
Slab* slab = Metaslab::get_slab(p);
return (likely(slab->dealloc_fast(super, p)));
}

SNMALLOC_FAST_PATH void small_local_dealloc_slow(void* p)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still SNMALLOC_FAST_PATH despite _slow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

{
auto super = Superslab::get(p);
Slab* slab = Metaslab::get_slab(p);
small_dealloc_offseted_slow(super, p, slab->get_meta().sizeclass);
}

SNMALLOC_FAST_PATH void small_dealloc_offseted_inner(
Superslab* super, void* p, sizeclass_t sizeclass)
{
Expand Down
1 change: 1 addition & 0 deletions src/mem/sizeclass.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ namespace snmalloc
constexpr static size_t
sizeclass_to_inverse_cache_friendly_mask(sizeclass_t sc);
constexpr static uint16_t medium_slab_free(sizeclass_t sizeclass);
template<bool ASSUME_SMALL = false>
static sizeclass_t size_to_sizeclass(size_t size);

constexpr static inline sizeclass_t size_to_sizeclass_const(size_t size)
Expand Down
3 changes: 2 additions & 1 deletion src/mem/sizeclasstable.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,10 @@ namespace snmalloc
return sizeclass_metadata.inverse_cache_friendly_mask[sizeclass];
}

template<bool ASSUME_SMALL>
static inline sizeclass_t size_to_sizeclass(size_t size)
{
if ((size - 1) <= (SLAB_SIZE - 1))
if ((size - 1) <= (SLAB_SIZE - 1) || ASSUME_SMALL)
{
auto index = sizeclass_lookup_index(size);
SNMALLOC_ASSUME(index <= sizeclass_lookup_index(SLAB_SIZE));
Expand Down
27 changes: 27 additions & 0 deletions src/override/malloc.cc
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,33 @@ extern "C"
return ThreadAlloc::get_noncachable()->alloc<ZeroMem::YesZero>(sz);
}

SNMALLOC_EXPORT
void SNMALLOC_NAME_MANGLE(free_local_small)(void* ptr)
{
if (Alloc::small_local_dealloc(ptr))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd feel slightly better if this if(!fast) slow dance were inside Alloc rather than having it export the fast and slow paths separately? Unless there's some reason to want to expose this for inlining?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has to at least be in ThreadAlloc, as we don't want to take the TLS lookup on the fast path. So we need that in scope. I think we probably need a better refactor if we actually want to support this design. This is more a hack to get some codegen, and see if the use case makes sense.

return;
ThreadAlloc::get_noncachable()->small_local_dealloc_slow(ptr);
}

SNMALLOC_EXPORT
void* SNMALLOC_NAME_MANGLE(malloc_small)(size_t size)
{
return ThreadAlloc::get_noncachable()->small_alloc<NoZero, YesReserve>(
size);
}

SNMALLOC_EXPORT
void* SNMALLOC_NAME_MANGLE(malloc_small_64)()
{
return ThreadAlloc::get_noncachable()->small_alloc<NoZero, YesReserve>(64);
}

SNMALLOC_EXPORT
void* SNMALLOC_NAME_MANGLE(malloc_small_63)()
{
return ThreadAlloc::get_noncachable()->small_alloc<NoZero, YesReserve>(63);
}

SNMALLOC_EXPORT
size_t SNMALLOC_NAME_MANGLE(malloc_usable_size)(
MALLOC_USABLE_SIZE_QUALIFIER void* ptr)
Expand Down
41 changes: 41 additions & 0 deletions src/redirect/generate.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#include "snmalloc.h"

#include <fstream>
#include <iostream>

int main(int argc, char* argv[])
{
if (argc != 2)
{
std::cerr << "Call with output file name" << std::endl;
return 1;
}

// open a file in write mode.
ofstream outfile;
outfile.open(argv[1]);

for (size_t align = 0; align < 10; align++)
{
for (size_t size = 1024; size > 0; size -= 16)
{
auto asize = snmalloc::aligned_size(1ULL << align, size);
auto sizeclass = snmalloc::size_to_sizeclass(asize);
auto rsize = snmalloc::sizeclass_to_size(sizeclass);
if (rsize == size && align == 0)
{
outfile << "DEFINE_MALLOC_SIZE(__stack_alloc_small_" << size << "_" << align << ", " << size
<< ");" << std::endl;
}
else
{
outfile << "REDIRECT_MALLOC_SIZE(__stack_alloc_small_" << size << "_" << align << ", __stack_alloc_small_"
<< rsize << "_" << 0 << ");" << std::endl;
}
outfile << "GENERATE_FREE_SIZE(__stack_free_small_" << size << "_" << align << ");" << std::endl;

}
}

outfile.close();
}
51 changes: 51 additions & 0 deletions src/redirect/redirect.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@

#include "snmalloc.h"

#define NAME(a) malloc_size_##a
#define STRINGIFY(a) a
#define NAME_STRING(a) NAME(a)

#ifdef WIN32
# define REDIRECT_MALLOC_SIZE(a, b) \
extern "C" void* NAME(a)(); \
__pragma(comment(linker, "/alternatename:##a=##b"))
#else
# define REDIRECT_MALLOC_SIZE(a, b) \
__attribute__((alias(#b))) extern "C" void* a()
#endif

#define DEFINE_MALLOC_SIZE(name, s) \
extern "C" void* name() \
{ \
return snmalloc::ThreadAlloc::get_noncachable()->template alloc<s>(); \
}

extern "C" void free_local_small(void* ptr)
{
if (snmalloc::Alloc::small_local_dealloc(ptr))
return;
snmalloc::ThreadAlloc::get_noncachable()->small_local_dealloc_slow(ptr);
}

#ifdef WIN32
# define GENERATE_FREE_SIZE(a) \
extern "C" void* NAME(a)(); \
__pragma(comment(linker, "/alternatename:##a=free_local_small"))
#else
# define GENERATE_FREE_SIZE(a) \
__attribute__((alias("free_local_small"))) extern "C" void* a()
#endif

void* __stack_alloc_large(size_t size, size_t align)
{
size_t asize = snmalloc::aligned_size(align, size);
return snmalloc::ThreadAlloc::get_noncachable()->alloc(asize);
}

void __stack_free_large(void* ptr, size_t size, size_t align)
{
size_t asize = snmalloc::aligned_size(align, size);
snmalloc::ThreadAlloc::get_noncachable()->dealloc(ptr, asize);
}

#include "generated.cc"
11 changes: 11 additions & 0 deletions src/test/func/malloc/malloc.cc
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,12 @@ void test_memalign(size_t size, size_t align, int err, bool null)
check_result(size, align, p, err, null);
}

void test_local(size_t size)
{
for (int n = 0; n < 1000; n++)
our_free_local_small(our_malloc_small(size));
}

int main(int argc, char** argv)
{
UNUSED(argc);
Expand All @@ -125,6 +131,11 @@ int main(int argc, char** argv)

constexpr int SUCCESS = 0;

for (size_t i = 1; i < SLAB_SIZE; i += 16)
{
test_local(i);
}

test_realloc(our_malloc(64), 4194304, SUCCESS, false);

for (sizeclass_t sc = 0; sc < (SUPERSLAB_BITS + 4); sc++)
Expand Down