Skip to content

Commit

Permalink
BOM
Browse files Browse the repository at this point in the history
  • Loading branch information
ODAncona committed Feb 10, 2025
1 parent c5375be commit 7c96e24
Show file tree
Hide file tree
Showing 5 changed files with 82 additions and 4 deletions.
16 changes: 13 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,10 @@ You can customize the prompt template to achieve any of the desired use cases. I

### Binary releases

Download the latest binary for your OS from [Releases](https://github.com/mufeedvh/code2prompt/releases).
Download the latest binary for your OS from [Releases](https://github.com/mufeedvh/code2prompt/releases).

### Source build

Requires:

- [Git](https://git-scm.org/downloads), [Rust](https://rust-lang.org/tools/install) and Cargo.
Expand All @@ -58,7 +59,8 @@ cargo build --release
```

## cargo
installs from the [`crates.io`](https://crates.io) registry.

installs from the [`crates.io`](https://crates.io) registry.

```sh
cargo install code2prompt
Expand All @@ -71,14 +73,16 @@ cargo install --git https://github.com/mufeedvh/code2prompt --force
```

### AUR

`code2prompt` is available in the [`AUR`](https://aur.archlinux.org/packages?O=0&K=code2prompt). Install it via any AUR helpers.

```sh
paru/yay -S code2prompt
```

### Nix
If you are on nix, You can use `nix-env` or `profile` to install.

If you are on nix, You can use `nix-env` or `profile` to install.

```sh
# without flakes:
Expand Down Expand Up @@ -140,6 +144,7 @@ Save the generated prompt to an output file:
```sh
code2prompt path/to/codebase --output=output.txt
```

Print output as JSON:

```sh
Expand Down Expand Up @@ -274,6 +279,7 @@ code2prompt also provides Python bindings for seamless integration into Python a
See [python-sdk/README.md](python-sdk/README.md) for detailed documentation and usage examples.

Example usage:

```python
from code2prompt import CodePrompt

Expand All @@ -282,6 +288,10 @@ result = prompt.generate(encoding="cl100k")
print(result["prompt"])
```

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=mufeedvh/code2prompt&type=Date)](https://star-history.com/#mufeedvh/code2prompt&Date)

## How is it useful?

`code2prompt` makes it easy to generate prompts for LLMs from your codebase. It traverses the directory, builds a tree structure, and collects information about each file. You can customize the prompt generation using Handlebars templates. The generated prompt is automatically copied to your clipboard and can also be saved to an output file. `code2prompt` helps streamline the process of creating LLM prompts for code analysis, generation, and other tasks.
Expand Down
2 changes: 2 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ pub mod path;
pub mod python;
pub mod template;
pub mod token;
pub mod util;

pub use filter::should_include_file;
pub use git::{get_git_diff, get_git_diff_between_branches, get_git_log};
Expand All @@ -12,3 +13,4 @@ pub use template::{
copy_to_clipboard, handle_undefined_variables, handlebars_setup, render_template, write_to_file,
};
pub use token::{count_tokens, get_model_info, get_tokenizer};
pub use util::strip_utf8_bom;
5 changes: 4 additions & 1 deletion src/path.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
//! This module contains the functions for traversing the directory and processing the files.
use crate::filter::should_include_file;
use crate::util::strip_utf8_bom;
use anyhow::Result;
use ignore::WalkBuilder;
use log::debug;
Expand All @@ -9,6 +10,7 @@ use std::fs;
use std::path::Path;
use termtree::Tree;


/// Traverses the directory and returns the string representation of the tree and the vector of JSON file representations.
///
/// # Arguments
Expand Down Expand Up @@ -78,7 +80,8 @@ pub fn traverse_directory(
// ~~~ Process the file ~~~
if path.is_file() {
if let Ok(code_bytes) = fs::read(path) {
let code = String::from_utf8_lossy(&code_bytes);
let clean_bytes = strip_utf8_bom(&code_bytes);
let code = String::from_utf8_lossy(&clean_bytes);

let code_block = wrap_code_block(&code, path.extension().and_then(|ext| ext.to_str()).unwrap_or(""), line_number, no_codeblock);

Expand Down
15 changes: 15 additions & 0 deletions src/util.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
//! This module contains util functions
/// Removes a UTF‑8 Byte Order Mark (BOM) from the beginning of a byte slice if present.
///
/// The UTF‑8 BOM is the byte sequence `[0xEF, 0xBB, 0xBF]`. This function checks whether
/// the provided slice starts with these bytes and, if so, returns a subslice without them.
/// Otherwise, it returns the original slice.
pub fn strip_utf8_bom(data: &[u8]) -> &[u8] {
const BOM: &[u8] = &[0xEF, 0xBB, 0xBF];
if data.starts_with(BOM) {
&data[BOM.len()..]
} else {
data
}
}
48 changes: 48 additions & 0 deletions tests/util_test.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
use code2prompt::util::strip_utf8_bom;

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_strip_utf8_bom_when_present() {
let input = b"\xEF\xBB\xBFHello, world!";
let expected = b"Hello, world!";
let output = strip_utf8_bom(input);
assert_eq!(
output, expected,
"BOM should be stripped from the beginning of the input."
);
}

#[test]
fn test_strip_utf8_bom_when_not_present() {
let input = b"Hello, world!";
let output = strip_utf8_bom(input);
assert_eq!(
output, input,
"Input without a BOM should remain unchanged."
);
}

#[test]
fn test_strip_utf8_bom_empty_input() {
let input = b"";
let output = strip_utf8_bom(input);
assert_eq!(
output, input,
"An empty input should return an empty output."
);
}

#[test]
fn test_strip_utf8_bom_only_bom() {
let input = b"\xEF\xBB\xBF";
let expected = b"";
let output = strip_utf8_bom(input);
assert_eq!(
output, expected,
"Input that is only a BOM should return an empty slice."
);
}
}

0 comments on commit 7c96e24

Please sign in to comment.