Skip to content

Support cs_disasm_iter #115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 21, 2025
Merged

Support cs_disasm_iter #115

merged 15 commits into from
Jul 21, 2025

Conversation

dnsserver
Copy link
Contributor

What do you think about this API?

@tmfink
Copy link
Member

tmfink commented Sep 21, 2021

This is a big improvement over the last PR. 😄

Overall, I like the approach of having separate type to "own" the cs_insn allocation. I also like how DisasmIter tracks the "updates" from repeated calls to cs_disasm_iter().

I have some comments that I will make in-line.

tmfink
tmfink previously requested changes Sep 21, 2021
@tmfink
Copy link
Member

tmfink commented Sep 22, 2021

I just pushed a fix (#116) for the GitHub action failure; please rebase to the current master branch to get the fix.

@codecov
Copy link

codecov bot commented Sep 27, 2021

Codecov Report

Merging #115 (1e57323) into master (854de72) will not change coverage.
The diff coverage is n/a.

❗ Current head 1e57323 differs from pull request most recent head 06dbfb6. Consider uploading reports for the commit 06dbfb6 to get more accurate results
Impacted file tree graph

@@           Coverage Diff           @@
##           master     #115   +/-   ##
=======================================
  Coverage   94.90%   94.90%           
=======================================
  Files          22       22           
  Lines        2670     2670           
=======================================
  Hits         2534     2534           
  Misses        136      136           
Impacted Files Coverage Δ
capstone-rs/src/capstone.rs 93.52% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 854de72...06dbfb6. Read the comment docs.

@jiegec
Copy link
Contributor

jiegec commented Jun 4, 2025

I would prefer to create a rust-style Iterator that wraps the cs_disasm_iter logic so that the user can iterate instructions quickly, and provide a wrapper for the raw cs_disasm_iter function if the user wants to customize the usage.

@jiegec jiegec linked an issue Jul 21, 2025 that may be closed by this pull request
@jiegec jiegec merged commit aca4722 into capstone-rust:master Jul 21, 2025
16 checks passed
jiegec pushed a commit that referenced this pull request Jul 21, 2025
See issue #1 and pr #115.

Co-authored-by: denis <[email protected]>
Co-authored-by: @HanabishiRecca
@jiegec
Copy link
Contributor

jiegec commented Jul 21, 2025

Benchmarking result on i9-14900K:

disasm_x86              time:   [2.7127 ms 2.7144 ms 2.7167 ms]
disasm_x86_iter         time:   [2.6826 ms 2.6839 ms 2.6855 ms]
disasm_x86_detail       time:   [10.215 ms 10.228 ms 10.241 ms]
disasm_x86_detail_iter  time:   [3.8006 ms 3.8027 ms 3.8052 ms]

///
/// If `cs_malloc` failed due to OOM, [`Err(Error::OutOfMemory)`](Error::OutOfMemory) is returned.
pub fn disasm_iter<'a, 'b>(
&'a self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use lifetime names cs/buf like below to make this more clear

code: &'b [u8],
addr: u64,
) -> CsResult<DisasmIter<'a, 'b>> {
let insn = unsafe { cs_malloc(self.csh()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid reallocating on each call to disasm_iter() we could have a new type (e.g. "DisasBuffer") that contains the underlying buffer. Callers would pass in the "buffer" which would let them amortize the cost of the allocations by only doing it once.

Since this would be more complicated, we should create a new method and also have the simpler interface.

/// # Errors
///
/// If `cs_malloc` failed due to OOM, [`Err(Error::OutOfMemory)`](Error::OutOfMemory) is returned.
pub fn disasm_iter<'a, 'b>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this disasm_iter() is more efficient than disasm_all() and has the same interface, we should delete disasm_all()/disasm_count()/disasm() and only keep disasm_iter() as written.

We can simplify the naming by renaming disasm_iter() to just disasm().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

@jiegec
Copy link
Contributor

jiegec commented Jul 25, 2025

There is a problem, though: for disasm_all users, the details buffer for each cs_insn is different, so they can use Insn directly; if disasm_iter is used instead, then the details buffer should be cloned via OwnedInsn for the users to store the result in a vector: disasm_iter().unwrap().map(|i| (&i).into()).collect().

Therefore, we must either:

  1. ensure there are no two Insns pointing to the same *cs_insn at the same time, e.g. LendingIterator
  2. return OwnedInsn directly at a cost of more memory allocation

Performance of the OwnedInsn way:

disasm_x86              time:   [2.6413 ms 2.6428 ms 2.6446 ms]
disasm_x86_iter         time:   [3.2921 ms 3.2945 ms 3.2974 ms]
disasm_x86_detail       time:   [10.315 ms 10.327 ms 10.340 ms]
disasm_x86_detail_iter  time:   [5.2468 ms 5.2501 ms 5.2545 ms]

In summary, the current disasm_iter is unsafe: you can have two Insn pointing to the same cs_detail. We need to either use lending iterator (see #187) or clone the cs_detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support cs_disasm_iter
3 participants