Skip to content

add mem alloc and derive macro #2299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 20, 2021

Conversation

Veeupup
Copy link
Contributor

@Veeupup Veeupup commented Oct 18, 2021

I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/

Summary

Summary about this PR:

  • Add session memory usage track

Changelog

  • New Feature

Related Issues

Fixes #1148

Test Plan

Unit Tests

Stateless Tests

Cargo.toml Outdated
@@ -16,6 +16,8 @@ members = [
"common/infallible",
"common/io",
"common/management",
"common/mem",
"common/mem-derive",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge the "common/mem-derive crate into common/mem? So it looks more cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will make it later. this pr has not finished yet but started integrating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this is just a pre-suggestion when a PR is in draft state :)

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@databend-bot databend-bot added the pr-feature this PR introduces a new feature to the codebase label Oct 18, 2021
@BohuTANG
Copy link
Member

The text add session memory usage track should move to the Summary section.
The Changelog section only for the datafuse-bot, it hints the bot to add what a label is, related codes:
https://github.com/datafuselabs/fusebots/blob/main/.github/labeler.yml#L23

I updated them.

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
jemalloc-sys = { version = "0.3.2" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jemalloc-sys is unmaintained. change to tikv-jemalloc-sys

@PsiACE
Copy link
Member

PsiACE commented Oct 18, 2021

For the License Checker, you need to add the correct License Header and, if necessary, confirm in advance that you can license the code to datafuselabs/databend.

For the Lint, please run make lint local first.

@Veeupup
Copy link
Contributor Author

Veeupup commented Oct 18, 2021

For the License Checker, you need to add the correct License Header and, if necessary, confirm in advance that you can license the code to datafuselabs/databend.

For the Lint, please run make lint local first.

thanks for guidiing, I will check liscense and code lint in next commit.

@Veeupup
Copy link
Contributor Author

Veeupup commented Oct 18, 2021

@BohuTANG Excuse me, I have a question when debugging test.
I want to print some messages to help me debug. As usual, I executed cargo test -- --show-output in query directory, but it gives me error msg

error: Found argument '--show-output' which wasn't expected, or isn't valid in this context
USAGE:
    databend_query-0b2d9701ac78efd2 [OPTIONS]

For more information try --help
error: test failed, to rerun pass '--lib'

how should I get output when debugging in databend unit test?

@BohuTANG
Copy link
Member

BohuTANG commented Oct 18, 2021

@BohuTANG Excuse me, I have a question when debugging test. I want to print some messages to help me debug. As usual, I executed cargo test -- --show-output in query directory, but it gives me error msg

error: Found argument '--show-output' which wasn't expected, or isn't valid in this context
USAGE:
    databend_query-0b2d9701ac78efd2 [OPTIONS]

For more information try --help
error: test failed, to rerun pass '--lib'

how should I get output when debugging in databend unit test?

I guess we should use log::info/debug!("xx"), it's more powerful than print.


let session_size = malloc_size(&session);

assert!(session_size > 3000);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@codecov-commenter
Copy link

codecov-commenter commented Oct 18, 2021

Codecov Report

Merging #2299 (d5dac7c) into main (f100287) will increase coverage by 1%.
The diff coverage is 66%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #2299     +/-   ##
=======================================
+ Coverage     69%     70%     +1%     
=======================================
  Files        647     624     -23     
  Lines      36262   34301   -1961     
=======================================
- Hits       25041   24152    -889     
+ Misses     11221   10149   -1072     
Impacted Files Coverage Δ
common/datavalues/src/types/data_type.rs 63% <20%> (-5%) ⬇️
common/datavalues/src/data_field.rs 54% <55%> (-1%) ⬇️
common/mem/mem-allocator/src/sizeof.rs 66% <66%> (ø)
common/mem/mem-allocator/src/malloc_size.rs 74% <74%> (ø)
common/datavalues/src/data_value.rs 42% <100%> (ø)
common/mem/mem-allocator/src/allocators.rs 100% <100%> (ø)
common/meta/flight/src/flight_client_conf.rs 0% <0%> (-100%) ⬇️
query/src/common/meta/meta_client.rs 50% <0%> (-10%) ⬇️
common/meta/types/src/database_info.rs 75% <0%> (-5%) ⬇️
query/src/datasources/database/tests.rs 88% <0%> (-5%) ⬇️
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f100287...d5dac7c. Read the comment docs.

@Veeupup Veeupup marked this pull request as ready for review October 19, 2021 09:26
@@ -185,4 +193,8 @@ impl Session {
pub fn get_user_manager(self: &Arc<Self>) -> UserManagerRef {
self.sessions.get_user_manager()
}

pub fn get_memory_usage(self: &Arc<Self>) -> usize {
Copy link
Member

@zhang2014 zhang2014 Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!!!

How do we get the memory usage for in session query?

For example:

SELECT number FROM numbers(10) GROUP BY number
-- In this query, we will create a custom HashMap to complete data aggregate(not session fields), which will use some megabytes.

Copy link
Contributor Author

@Veeupup Veeupup Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we will track Session object itself. #1148 (comment)
For detail, any member variable in Session without macro ignore_malloc_size_of will be computed. If the custom Hashmap to store within the session object, it can be computed.
Any object that derive the trait MallocSizeOf can get their memory usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, current implementation could be the basic util tools to count the malloc size inside Session.

Copy link
Member

@zhang2014 zhang2014 Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation is great. an idea (welcome any opinion):

In databend query, we use tokio runtime to complete the interaction with user sessions(MySQL, ClickHouse). maybe we can monitor the memory usage of Tokio runtime by some way?

For example:

struct RuntimeTracker {
    parent: RuntimeTracker,
    memory_usage: AtomicUsize,
}

struct WorkerTracker {
    runtime_tracker: Arc<RuntimeTracker>
}

thread_local!{
    woker_tracker: Option<WorkerTracker> = None;
}

impl WorkerTracker {
    pub fn instance() -> WorkerTracker {
        worker_trakcer.unwrap()
    }
}

struct MyAllocator(DefaultAllocator)

impl Allocator for MyAllocator {
    #[inline]
   fn alloc(&self, layout: Layout) -> *mut u8 {
        WorkerTracker::instance().runtime_tracker.memory_usage.fetch_and_add(layout.size());
        self.0.alloc(layout)
    }
    ...
}

// in common_base/runtime.rs:

impl Runtime {
    pub fn new() -> Result<Runtime> {
       // StdThread call Runtime::new()
       // RuntimeThread call Runtime::new()
      //  Chain for runtime_tracker if need
        let outer_tacker = WorkerTacker::instance();
        let runtime_builder = ...;
        runtime_builder.on_thread_start(move || { woker_tracker.with(|mut tracker| tracker.runtime_tracker.parent = Some(outer_tracker.runtime_tracker);)})
    }

    pub fn get_runtime_tracker(&self) -> RuntimeTracker {
        self.tracker
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it addresses lots of code from https://crates.io/crates/parity-util-mem, is there any possibility to directly use the crate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now we will track Session object itself. #1148 (comment)
For detail, any member variable in Session without macro ignore_malloc_size_of will be computed. If the custom Hashmap to store within the session object, it can be computed.
Any object that derive the trait MallocSizeOf can get their memory usage.

Sorry, I missed this comment. only tracking session object in this PR is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhang2014 I understand what you are meaning, but I am not clear about why RuntimeTracker has a parent field ?

this idea is great! we can track all memory interested, maybe we can finish it in another pr?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundy-li maybe not to directly use this crate, here are the reasons:

  1. for common/mem/mem-derive, we may not be ok to use it directly because we need to modify the rule that derive macro MallocOfSize with databend private crate name, code in here, the macro will depend on which crate name is.

  2. for common/mem/mem-allocator, for the similarly reason, if we directly use crate from https://crates.io/crates/parity-util-mem, we can not impl trait MallocOfSize with our own code as mod alloc_size is private in crate parity-util-mem here. because databend has a Mutex impl here.
    another reason that is that this crate defines a global mem allocator here, and we maybe not ok to define databend own allocator if we want to expend functions with databend global allocators.

if anything worry, please inform me of that.

Copy link
Member

@zhang2014 zhang2014 Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhang2014 I understand what you are meaning, but I am not clear about why RuntimeTracker has a parent field ?

We need recursive tracing if create a runtime again in other runtime worker.

let runtime_1 = Runtime:create()?;
runtime_1.spawn(async {
    let runtime_2 = Runtime::create()?;
    runtime_2.spawn(async {
        // alloc memory
    });
});

@databend-bot
Copy link
Member

CI Passed
Reviewer Approved
Let's Merge

@BohuTANG BohuTANG merged commit 691f70d into databendlabs:main Oct 20, 2021
@BohuTANG
Copy link
Member

Thank you @Veeupup
Excellent work!!!

@Veeupup Veeupup deleted the session_mem_alloctor branch December 3, 2021 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[memory] session memory usage
7 participants