Kotlin-LlamaCpp

Run GGUF models on your android app with ease!

This is a Android binding for llama.cpp written in Kotlin, designed for native Android applications. This project is inspired (forked) by cui-llama.rn and llama.cpp: Inference of LLaMA model in pure C/C++but specifically tailored for Android development in Kotlin.

This is a very early alpha version and API may change in the future.

Features

Helper class to handle initialization and context handling
Native Kotlin bindings for llama.cpp
Support for stopping prompt processing between batches
Vocabulary-only mode for tokenizer functionality
Synchronous tokenizer functions
Context Shift support (from kobold.cpp)
XTC sampling implementation
Progress callback support
CPU feature detection (i8mm and dotprod flags)
Seamless integration with Android development workflow

Installation

Add the following to your project's build.gradle:

dependencies {
    implementation 'io.github.ljcamargo:llamacpp-kotlin:0.1.0'
}

Model Requirements

You'll need a GGUF model file to use this library. You can:

Download pre-converted GGUF models from HuggingFace
Convert your own models following the llama.cpp quantization guide

Usage

Check this example ViewModel using LlamaHelper class for basic usage

class MainViewModel: ViewModel() {

    private val viewModelJob = SupervisorJob()
    private val scope = CoroutineScope(Dispatchers.IO + viewModelJob)
    private val llamaHelper by lazy { LlamaHelper(scope) }
    
    val text = MutableStateFlow("")

    // load model into memory
    suspend fun loadModel() {
        llamaHelper.load(
            path = "/sdcard/Download/llama.ggmlv3.q4_0.bin",
            contextLength = 2048,
        )
    }

    // model should be loaded before submitting or an exception will be thrown
    suspend fun submit(prompt: String) {
        // collector must be called before predict
        llamaHelper.setCollector()
            .onStart {
                Log.i("MainViewModel", "prediction started")
                // prediction started, prepare your UI
                // the first token will arrive after some seconds of warmup
                text.emit("")
            }
            .onCompletion {
                Log.i("MainViewModel", "prediction ended")
                // onCompletion will be triggered when finished or aborted
                llamaHelper.unsetCollector() // unset collector
            }
            .collect { chunk ->
                Log.i("MainViewModel", "prediction $chunk")
                // collect chunks of text as they arrive
                // you can, for example, emit to a StateFlow to observe it in your UI
                text.value += chunk
            }
        llamaHelper.predict(
            prompt = prompt,
            partialCompletion = true
        )
    }

    // you can abort the model load or prediction in progress
    fun abort() {
        Log.i("MainViewModel", "prediction ended")
        llamaHelper.abort()
    }

    // don't forget to release resources when your viewmodel is destroyed
    override fun onCleared() {
        super.onCleared()
        llamaHelper.abort()
        llamaHelper.release()
    }
}

You can also use LlamaContext.kt directly to handle several contexts or other complex features

Performance Considerations

The library currently supports arm64-v8a and x86_64 platforms
64-bit platforms are recommended for better memory allocation
CPU feature detection helps optimize performance based on device capabilities
Batch processing can be interrupted, which is crucial for mobile devices with limited processing power

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT

Acknowledgments

This project builds upon the work of several excellent projects:

llama.cpp by Georgi Gerganov
cui-llama.rn
llama.rn

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github		.github
.idea		.idea
app		app
gradle		gradle
llamaCpp		llamaCpp
.gitignore		.gitignore
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kotlin-LlamaCpp

Features

Installation

Model Requirements

Usage

Performance Considerations

Contributing

License

Acknowledgments

About

Releases

Sponsor this project

Packages

Languages

ljcamargo/kotlinllamacpp

Folders and files

Latest commit

History

Repository files navigation

Kotlin-LlamaCpp

Features

Installation

Model Requirements

Usage

Performance Considerations

Contributing

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages