Skip to content

Support cs_disasm_iter #115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 21, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions capstone-rs/src/capstone.rs
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,43 @@ impl Capstone {
}
}

/// Disassemble and iterate instructions from user-provided buffer `code` using `cs_disasm_iter`.
/// The disassembled address of the buffer is assumed to be `addr`.
/// It uses less memory and reduces memory allocations.
///
/// # Examples
///
/// ```
/// # use capstone::prelude::*;
/// # let cs = Capstone::new().x86().mode(arch::x86::ArchMode::Mode32).build().unwrap();
/// let mut iter = cs.disasm_iter(b"\x90", 0x1000).unwrap();
/// assert_eq!(iter.next().unwrap().mnemonic(), Some("nop"));
/// assert!(iter.next().is_none());
/// ```
///
/// # Errors
///
/// If `cs_malloc` failed due to OOM, [`Err(Error::OutOfMemory)`](Error::OutOfMemory) is returned.
pub fn disasm_iter<'a, 'b>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this disasm_iter() is more efficient than disasm_all() and has the same interface, we should delete disasm_all()/disasm_count()/disasm() and only keep disasm_iter() as written.

We can simplify the naming by renaming disasm_iter() to just disasm().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea!

&'a self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use lifetime names cs/buf like below to make this more clear

code: &'b [u8],
addr: u64,
) -> CsResult<DisasmIter<'a, 'b>> {
let insn = unsafe { cs_malloc(self.csh()) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid reallocating on each call to disasm_iter() we could have a new type (e.g. "DisasBuffer") that contains the underlying buffer. Callers would pass in the "buffer" which would let them amortize the cost of the allocations by only doing it once.

Since this would be more complicated, we should create a new method and also have the simpler interface.

if insn.is_null() {
return Err(Error::OutOfMemory);
}
Ok(DisasmIter {
insn,
csh: self.csh,
code: code.as_ptr(),
size: code.len(),
addr,
_data1: PhantomData,
_data2: PhantomData,
})
}

/// Disassemble all instructions in buffer
///
/// ```
Expand Down Expand Up @@ -590,3 +627,103 @@ impl Drop for Capstone {
unsafe { cs_close(&mut self.csh()) };
}
}

/// Structure to handle iterative disassembly.
///
/// Create with a [`Capstone`](Capstone) instance: [`Capstone::disasm_iter()`](Capstone::disasm_iter).
///
/// # Lifetimes
///
/// `'cs` is the lifetime of the [`Capstone`](Capstone) instance.
/// `'buf` is the lifetime of the user provided code buffer in [`Capstone::disasm_iter()`](Capstone::disasm_iter).
///
pub struct DisasmIter<'cs, 'buf> {
insn: *mut cs_insn, // space for current instruction to be processed
csh: *mut c_void, // reference to the the capstone handle required by disasm_iter
code: *const u8, // pointer to the code buffer
size: usize, // size of the code buffer
addr: u64, // current address
_data1: PhantomData<&'cs ()>, // used to make sure DisasmIter lifetime doesn't exceed Capstone's lifetime
_data2: PhantomData<&'buf ()>, // used to make sure code lifetime doesn't exceed user provided array
}

impl<'cs, 'buf> Drop for DisasmIter<'cs, 'buf> {
fn drop(&mut self) {
unsafe { cs_free(self.insn, 1) };
}
}

impl<'cs, 'buf> Iterator for DisasmIter<'cs, 'buf> {
type Item = Insn<'cs>;

fn next(&mut self) -> Option<Self::Item> {
unsafe {
if cs_disasm_iter(
self.csh as csh,
&mut self.code,
&mut self.size,
&mut self.addr,
self.insn,
) {
return Some(Insn::from_raw(self.insn));
}
}

None
}
}

impl<'cs, 'buf> DisasmIter<'cs, 'buf> {
/// Get the slice of the code yet to be disassembled
///
/// ```
/// # use capstone::prelude::*;
/// # let cs = Capstone::new().x86().mode(arch::x86::ArchMode::Mode32).build().unwrap();
/// let code = b"\x90";
/// let mut iter = cs.disasm_iter(code, 0x1000).unwrap();
/// assert_eq!(iter.code(), code);
/// iter.next();
/// assert_eq!(iter.code(), b"");
/// ```
pub fn code(&self) -> &[u8] {
unsafe { core::slice::from_raw_parts(self.code, self.size) }
}

/// Get the address of the next instruction to be disassembled
///
/// ```
/// # use capstone::prelude::*;
/// # let cs = Capstone::new().x86().mode(arch::x86::ArchMode::Mode32).build().unwrap();
/// let code = b"\x90";
/// let mut iter = cs.disasm_iter(code, 0x1000).unwrap();
/// assert_eq!(iter.addr(), 0x1000);
/// iter.next();
/// assert_eq!(iter.addr(), 0x1001);
/// ```
pub fn addr(&self) -> u64 {
self.addr
}

/// Reset the iterator to disassemble in the specified code buffer
///
/// ```
/// # use capstone::prelude::*;
/// # let cs = Capstone::new().x86().mode(arch::x86::ArchMode::Mode32).build().unwrap();
/// let code = b"\x90";
/// let mut iter = cs.disasm_iter(code, 0x1000).unwrap();
/// assert_eq!(iter.addr(), 0x1000);
/// assert_eq!(iter.code(), code);
/// iter.next();
/// assert_eq!(iter.addr(), 0x1001);
/// assert_eq!(iter.code(), b"");
/// let new_code = b"\xc3";
/// iter.reset(new_code, 0x2000);
/// assert_eq!(iter.addr(), 0x2000);
/// assert_eq!(iter.code(), new_code);
/// ```
pub fn reset(&mut self, code: &'buf [u8], addr: u64) {
self.code = code.as_ptr();
self.size = code.len();
self.addr = addr;
}
}
Loading