llmedge is a lightweight Android library for running GGUF language models fully on-device, powered by llama.cpp.
See the examples repository for sample usage.
Acknowledgments to Shubham Panchal and upstream projects are listed in CREDITS.md.
Note
This library is in early development and may change significantly.
Important
API maturity is uneven by feature area. LLMEdge, text inference, speech inference, and model management are the most stable entry points today. OCR via edge.vision.extractText(...) is also reliable. Vision/VLM analysis, RAG, and some image/video-generation flows are available and tested, but should still be treated as evolving APIs.
- LLM Inference: Run GGUF models directly on Android using llama.cpp (JNI)
- Model Downloads: Download and cache models from Hugging Face Hub
- Optimized Inference: Native KV cache reuse for compact chats, default batched blocking and streaming text generation, separate prompt vs generation thread tuning, and Kotlin-managed
ChatSessionreplay for reasoning-heavy models - Speech-to-Text (STT): Whisper.cpp integration with timestamp support, language detection, streaming transcription, and SRT generation
- Text-to-Speech (TTS): Bark.cpp integration with ARM optimizations
- Image Generation: Stable Diffusion with EasyCache and LoRA support
- Video Generation: Wan 2.1 models (4-64 frames) with sequential loading
- On-device RAG: PDF indexing, embeddings, vector search, Q&A
- OCR: Google ML Kit text extraction
- Memory Metrics: Built-in RAM usage monitoring
- Vision Models: Architecture prepared for LLaVA-style models (requires specific model formats)
- GPU Acceleration: Optional Android GPU backends for text, Whisper, and image/video with experimental OpenCL preferred first, Vulkan fallback second, and CPU fallback last
Warning
For development, Linux is strongly recommended for GPU-enabled builds. The Vulkan shader-generation path used by Stable Diffusion is still unreliable on Windows cross-builds.
Clone the repository along with the llama.cpp and stable-diffusion.cpp submodule:
git clone --depth=1 https://github.com/Aatricks/llmedge
cd llmedge
git submodule update --init --recursiveOpen the project in Android Studio. If it does not build automatically, use Build > Rebuild Project.
For Maven Central:
repositories {
google()
mavenCentral()
}
dependencies {
implementation("io.github.aatricks:llmedge:0.3.9")
}For GitHub Packages:
repositories {
google()
mavenCentral()
maven {
url = uri("https://maven.pkg.github.com/Aatricks/llmedge")
credentials {
username = providers.gradleProperty("gpr.user").orNull ?: System.getenv("GITHUB_ACTOR")
password = providers.gradleProperty("gpr.key").orNull ?: System.getenv("GITHUB_TOKEN")
}
}
}
dependencies {
implementation("io.github.aatricks:llmedge:0.3.9")
}The recommended entry point is the instance-based LLMEdge facade. It exposes domain clients for text, speech, image generation, vision, and RAG while keeping model resolution and resource ownership explicit.
val edge = LLMEdge.create(
context = context,
scope = viewModelScope,
)
viewModelScope.launch {
val reply = edge.text.generate(
prompt = "Summarize on-device LLMs in one sentence.",
)
outputView.text = reply
}Low-level wrappers like SmolLM, StableDiffusion, Whisper, and BarkTTS remain available for expert workflows, but new code should prefer LLMEdge.
The intended acquisition path for application code is:
edge.models.prefetch(...)when you want explicit downloads- feature clients like
edge.text,edge.speech,edge.image, andedge.visionwhen you want inference
Direct HuggingFaceHub calls and expert runtime loadFromHuggingFace(...) helpers are still supported, but they are advanced APIs for callers that need artifact-level control.
By default, edge.text.generate(...) uses batched native decoding for lower JNI overhead, while
edge.text.stream(...) uses smaller batched chunks so UI updates stay responsive without paying a
JNI crossing per token.
llmedge can resolve and cache model weights independently of inference:
val edge = LLMEdge.create(context, viewModelScope)
val modelFile = edge.models.prefetch(
ModelSpec.huggingFace(
repoId = "unsloth/Qwen3-0.6B-GGUF",
filename = "Qwen3-0.6B-Q4_K_M.gguf",
),
)
Log.d("llmedge", "Cached ${modelFile.name} at ${modelFile.parent}")-
edge.models.prefetch(...)andBoundModelRepository.resolve(...)keep model acquisition separate from any one inference client. -
Supports progress callbacks and private repositories via token through
ModelSpec.huggingFace(...). -
Requests to old mirrors automatically resolve to up-to-date Hugging Face repos.
-
Automatically uses the model's declared context window (minimum 1K tokens) and caps it to a heap-aware limit (2K–8K). Override with
InferenceParams(contextSize = …)if needed. -
Large downloads use Android's DownloadManager when
preferSystemDownloader = trueto keep transfers out of the Dalvik heap. -
Direct
HuggingFaceHubdownloads remain available for expert workflows, but most app code should stay on the facade/model-repository path.
Reasoning-aware models can be controlled from the facade through TextModelOptions. The default configuration keeps thinking enabled (ThinkingMode.DEFAULT, reasoning budget -1). To disable thinking for a request or session, pass the options explicitly:
val edge = LLMEdge.create(context, viewModelScope)
val reply = edge.text.generate(
prompt = "Solve this step by step, then give only the final answer.",
options = TextModelOptions(
thinkingMode = SmolLM.ThinkingMode.DISABLED,
reasoningBudget = 0,
),
)The same options work with edge.text.session(...) and edge.text.toolAgent(...).
Setting the budget to 0 always disables thinking, while -1 leaves it unrestricted. If you omit reasoningBudget, the library chooses 0 when the mode is DISABLED and -1 otherwise. The API also injects the /no_think tag automatically when thinking is disabled, so you do not need to modify prompts manually. If you need to flip reasoning state on a live expert runtime without reloading, see Expert APIs.
Use edge.text.session(...) when you want bounded multi-turn chat without exposing native storeChats state to application code.
val edge = LLMEdge.create(context, viewModelScope)
val session = edge.text.session(
memory = ConversationWindow(
maxTurns = 6,
maxTokens = 4096,
stripThinkTags = true,
),
systemPrompt = "You are a concise assistant.",
)
viewModelScope.launch {
session.prepare()
val reply = session.reply("Explain why context windows fill up.")
session.stream("Now summarize that in 3 bullets.").collect { event ->
when (event) {
is TextStreamEvent.Chunk -> print(event.value)
is TextStreamEvent.Completed -> println(event.fullText)
else -> Unit
}
}
}The new session API keeps transcript state in Kotlin, applies sliding-window trimming, and strips replayed <think>...</think> blocks by default so reasoning-heavy models do not exhaust the context window as quickly.
Use edge.text.toolAgent(...) when you want the model to call app-defined tools. Read-only tools execute automatically; action tools require an explicit policy decision.
val edge = LLMEdge.create(context, viewModelScope)
val factory = DeviceToolFactory(context)
val agent = edge.text.toolAgent(
tools = factory.createDefaultTools(),
systemPrompt = "Be concise and only use tools when needed.",
policy = ToolPolicies.ALLOW_ALL, // or keep the default to deny action tools
)
viewModelScope.launch {
val result = agent.reply("What time is it and how much battery is left?")
println(result.text)
agent.stream("Open https://example.com").collect { event ->
when (event) {
is ToolAgentEvent.ToolCallRequested -> println("Tool: ${event.call.tool}")
is ToolAgentEvent.TextChunk -> print(event.value)
is ToolAgentEvent.Completed -> println("\nDone: ${event.result.finishReason}")
else -> Unit
}
}
}Tool calls use a structured JSON envelope internally: {"tool":"name","arguments":{...}}. The parser also accepts the legacy tool_name field for robustness, but new prompts only emit the tool shape.
Speech APIs now support request-first calls in addition to the existing convenience overloads:
val result = edge.speech.transcribe(
SpeechToTextRequest(
audioSamples = samples,
model = edge.config.models.speechToText,
params = Whisper.TranscribeParams(language = "en"),
runtime = WhisperRuntimeRequest(gpuEnabled = false, flashAttention = true),
),
)This keeps new speech entrypoints aligned with the request-first style already used by text and image generation, while preserving the older parameter-list overloads for compatibility.
The text stack now separates prompt/batch processing from single-token generation so you can tune the two phases independently:
val edge = LLMEdge.create(
context = context,
scope = viewModelScope,
config = LLMEdgeConfig(
text = TextRuntimeConfig(
promptThreads = 6, // prompt/batch phase
generationThreads = 2, // token-by-token phase
batchSize = 8,
streamBatchSize = 4,
cache = RuntimeCacheConfig(maxEntries = 2, maxMemoryMb = 1536),
),
),
)
val reply = edge.text.generate(
prompt = "Explain speculative decoding.",
options = TextModelOptions(numThreads = 8, generationThreads = 3),
batchSize = 12,
)Practical defaults:
text.promptThreads: prompt/batch decode threadstext.generationThreads: single-token generation threadstext.batchSize: blocking text batch size (default8)text.streamBatchSize: streaming batch size (default4)text.cache.maxMemoryMb: upper bound for text-model cache accounting; the cache now refreshes against native model/state footprint instead of only the GGUF file size
Batch-size guidance:
1: lowest latency per chunk, highest JNI overhead4: good default for streaming UI updates8: good default for blocking text responses12+: better throughput for longer offline generations, but can delay intermediate updates
llmedge uses Google ML Kit Text Recognition for extracting text from images.
val edge = LLMEdge.create(context, viewModelScope)
val text = edge.vision.extractText(bitmap)
println("Extracted text: $text")Google ML Kit Text Recognition
- Fast and lightweight
- No additional data files needed
- Good for Latin scripts
- Add dependency:
implementation("com.google.mlkit:text-recognition:16.0.0")
OCR is exposed directly through edge.vision.extractText(...). The older VisionMode convenience
wrapper is gone; callers now choose explicitly between OCR and VLM analysis instead of routing both
through a second abstraction layer.
Analyze images using Vision Language Models (like LLaVA or Phi-3 Vision) via edge.vision.
Warning
The VLM path is experimental. It requires a vision-capable GGUF and a matching mmproj/projector file. When those components are unavailable or incompatible, edge.vision.analyze(...) now fails fast with a clear error instead of silently falling back to text-only prompting. OCR remains available through edge.vision.extractText(...).
val edge = LLMEdge.create(context, viewModelScope)
val description = edge.vision.analyze(
image = bitmap,
prompt = "Describe this image in detail.",
numThreads = 4,
generationThreads = 2,
) { status ->
Log.d("Vision", "Status: $status")
}The current high-level vision path creates a fresh SmolLM runtime per request, so it favors
isolation and predictable cleanup over pooled high-throughput reuse.
The manager handles the complex pipeline of:
- Preprocessing the image
- Loading the vision projector and model
- Encoding the image to embeddings
- Generating the textual response
Vision model support is currently experimental and requires specific model architectures (like LLaVA-Phi-3).
Transcribe audio using the new edge.speech client:
val edge = LLMEdge.create(context, viewModelScope)
val text = edge.speech.transcribeToText(audioSamples)
val segments = edge.speech.transcribe(
audioSamples = audioSamples,
params = Whisper.TranscribeParams(language = "en"),
)
segments.forEach { segment ->
println("[${segment.startTimeMs}ms] ${segment.text}")
}
val lang = edge.speech.detectLanguage(audioSamples)For live captioning, use the streaming transcription API with a sliding window approach:
val edge = LLMEdge.create(context, viewModelScope)
val session = edge.speech.createStreamingSession(
params = Whisper.StreamingParams(
stepMs = 3000,
lengthMs = 10000,
keepMs = 200,
language = "en",
useVad = true,
),
)
viewModelScope.launch {
session.events().collect { segment ->
updateCaptions(segment.text)
}
}
audioRecorder.onAudioChunk { samples ->
viewModelScope.launch { session.feedAudio(samples) }
}
session.stop()Streaming parameters:
stepMs: How often transcription runs (default: 3000ms). Lower = faster updates, higher CPU usage.lengthMs: Audio window size (default: 10000ms). Longer windows improve accuracy.keepMs: Overlap with previous window (default: 200ms). Helps maintain context.useVad: Voice Activity Detection - skips silent audio (default: true).
Direct Whisper access remains available for expert workflows, but the namespaced speech client is the standard integration path.
Recommended models:
ggml-tiny.bin(~75MB) - Fast, lower accuracyggml-base.bin(~142MB) - Good balanceggml-small.bin(~466MB) - Higher accuracy
Generate speech using edge.speech:
val edge = LLMEdge.create(context, viewModelScope)
val audio = edge.speech.synthesize("Hello, world!")
viewModelScope.launch {
edge.speech.synthesizeStream("Hello, world!").collect { event ->
when (event) {
is AudioStreamEvent.Progress -> Log.d("Bark", "${event.step.name}: ${event.percent}%")
is AudioStreamEvent.Result -> saveAudio(event.audio)
else -> Unit
}
}
}Direct BarkTTS access remains available for expert workflows, but the namespaced speech client is the standard integration path.
Generate images on-device using the namespaced edge.image client:
val edge = LLMEdge.create(context, viewModelScope)
val bitmap = edge.image.generate(
ImageGenerationRequest(
prompt = "a cute pastel anime cat, soft colors, high quality <lora:detail_tweaker:1.0>",
width = 512,
height = 512,
steps = 20,
loraModelDir = "/path/to/loras",
loraApplyMode = StableDiffusion.LoraApplyMode.AUTO,
),
)
imageView.setImageBitmap(bitmap)Key Optimizations:
- EasyCache:
edge.imageautomatically enables EasyCache for supported Diffusion Transformer (DiT) models such as Flux, SD3, Wan, Qwen Image, and Z-Image; it stays disabled for classic UNet pipelines. - Flash Attention: Automatically enabled for compatible image dimensions.
- LoRA: Apply fine-tuned weights on the fly without merging models.
For explicit runtime ownership or custom native-load experiments, the StableDiffusion class remains available in the expert API layer.
Generate short video clips using edge.image.generateVideo(...). The namespaced client surfaces progress as a Flow while reusing the existing Wan loading logic internally.
Hardware Requirements:
- 12GB+ RAM recommended for standard loading.
- 8GB+ RAM supported via
forceSequentialLoad = true(slower but memory-safe).
val edge = LLMEdge.create(context, viewModelScope)
val params = VideoGenerationRequest(
prompt = "a cat walking in a garden, high quality",
videoFrames = 8,
width = 512,
height = 512,
steps = 20,
cfgScale = 7.0f,
flowShift = 3.0f,
forceSequentialLoad = true,
)
viewModelScope.launch {
edge.image.generateVideo(params).collect { event ->
when (event) {
is GenerationStreamEvent.Progress -> Log.d("VideoGen", event.update.message)
is GenerationStreamEvent.Completed -> previewImageView.setImageBitmap(event.frames.first())
}
}
}edge.image automatically:
- Downloads the necessary Wan 2.1 model files (Diffusion, VAE, T5).
- Sequentially loads components to minimize peak memory usage (if requested).
- Manages the generation loop and frame conversion.
See llmedge-examples for a complete UI implementation.
Running the example app:
- Build the library (from the repo root):
./gradlew :llmedge:assembleRelease- Build and install the example app:
cd llmedge-examples
../gradlew :app:assembleDebug
../gradlew :app:installDebug- Open the app on device and pick the "Stable Diffusion" demo from the launcher. The demo downloads any missing files from Hugging Face and runs a quick txt2img generation.
Notes:
- The example explicitly downloads a VAE safetensors file for the
Meina/MeinaMixdemo; many repos include VAE files, but some GGUF model repos bundle everything you need. If the repo lacks a GGUF model file you'll get an obvious IllegalArgumentException — provide afilenameor choose a different repo in that case. - Use the system downloader for large safetensors/gguf files to avoid heap pressure on Android.
The library includes a minimal on-device RAG pipeline, similar to Android-Doc-QA, built with:
- Sentence embeddings (ONNX)
- Whitespace
TextSplitter - In-memory cosine
VectorStorewith JSON persistence SmolLMfor context-aware responses through the facade-managed RAG session
-
Download embeddings
From the Hugging Face repository
sentence-transformers/all-MiniLM-L6-v2, place:
llmedge/src/main/assets/embeddings/all-minilm-l6-v2/model.onnx
llmedge/src/main/assets/embeddings/all-minilm-l6-v2/tokenizer.json
- Build the library
./gradlew :llmedge:assembleRelease
- Use in your application
val edge = LLMEdge.create(this, lifecycleScope)
val rag = edge.rag.createSession()
lifecycleScope.launch {
rag.init()
val count = rag.indexPdf(pdfUri)
val answer = rag.ask("What are the key points?")
// render answer
}Direct RAGEngine construction remains available for expert workflows, but new app code should prefer edge.rag.createSession() so runtime ownership and teardown stay aligned with the rest of the library.
SmolLM, StableDiffusion, Whisper, BarkTTS, RAGEngine, and direct HuggingFaceHub access are still available when you need to hold a native runtime directly or override low-level loading behavior. They are intentionally secondary to the facade APIs.
Examples:
// Direct model download when you need full control over artifact selection.
val download = HuggingFaceHub.ensureModelOnDisk(
context = context,
modelId = "unsloth/Qwen3-0.6B-GGUF",
filename = "Qwen3-0.6B-Q4_K_M.gguf",
)
// Expert text runtime with live reasoning-state control.
val smol = SmolLM()
smol.load(download.file.absolutePath)
smol.setThinkingEnabled(false)
// Expert RAG wiring when you want to own both the runtime and the pipeline yourself.
val ragEngine = RAGEngine(context = context, smolLM = smol)If you want GPU acceleration for the native inference backends, follow these notes and requirements. On Android, llmedge now prefers OPENCL -> VULKAN -> CPU when GPU use is allowed for text, Whisper, and image/video requests. OpenCL support is experimental, Android-only, and currently limited to arm64-v8a. Bark remains CPU-only.
Prerequisites
- Android NDK r27 or newer (NDK r27 used in development; the NDK provides the Vulkan C headers). Ensure your NDK matches the version used by your build environment.
- CMake 3.22+ and Ninja (the Android Gradle plugin will pick up CMake when configured).
- Gradle (use the wrapper:
./gradlew). - Android API (minSdk) 30 or higher.
llmedgetargets Android 11+ today, and Vulkan support still requires Vulkan 1.2. - (Optional)
VULKAN_SDKset in the environment if you build shaders or use Vulkan SDK tools on the host. The build fetches a matchingvulkan.hppheader when needed.
To build the library with Vulkan support on a Linux host or WSL2, you must install the Vulkan shader compiler and development headers:
-
Install Dependencies:
sudo apt-get update sudo apt-get install -y glslc libvulkan-dev
-
Verify glslc: Ensure
glslcis in your PATH:glslc --version
-
Android NDK: Ensure you have Android NDK r27 (specifically
27.2.12479018) installed via Android Studio or the SDK manager.
Build flags
- On Linux/macOS hosts, the Gradle build enables Vulkan by default. On Windows hosts, it defaults to
OFFbecause the upstream shader-generator step is still fragile under the Android cross-build toolchain. Re-enable it explicitly only when your environment supports that path. - Experimental Android OpenCL is disabled by default. Enable it with
-PllmedgeAndroidOpencl=ONor the environment variableLLMEDGE_ANDROID_OPENCL=ON. - If you want both OpenCL and Vulkan compiled in explicitly, use:
./gradlew :llmedge:assembleRelease \
-PllmedgeAndroidOpencl=ON \
-Pandroid.injected.build.api=30 \
-Pandroid.jniCmakeArgs="-DSD_VULKAN=ON -DGGML_VULKAN=ON"Alternatively, set the same flags in your Android Studio CMake configuration. LLMEDGE_ANDROID_OPENCL is the library's experimental OpenCL toggle, while -DSD_VULKAN=ON and -DGGML_VULKAN=ON force Vulkan support for Stable Diffusion and ggml.
Notes about headers and toolchain
- The build fetches
Vulkan-Hpp(vulkan.hpp) and pins it to the NDK's Vulkan headers to avoid API mismatch. If you have a localVULKAN_SDKyou can point to it, otherwise the project will use the fetched headers. - When OpenCL is enabled, the build uses repo-managed OpenCL headers and a link-time loader shim. The packaged app still resolves the device's OpenCL implementation at runtime rather than shipping its own platform ICD.
- The repository also builds a small host toolchain to generate SPIR-V shaders at build time; ensure your build host has a working C++ toolchain (clang/gcc) and CMake configured.
Runtime verification
- To verify GPU capability at runtime:
- Run the app on an Android 11+ device.
- Use the per-subsystem capability APIs to inspect the engines you care about, for example
LLMEdge.getTextBackendAvailability(),LLMEdge.getSpeechBackendAvailability(),LLMEdge.getImageBackendAvailability(), andLLMEdge.getVisionBackendAvailability(). - Inspect runtime logs for the selected backend and any fallback reason. Example:
adb logcat -s SmolSD:* | sed -n '1,200p'Look for messages indicating OpenCL or Vulkan initialization. `LLMEdgeConfig(text = TextRuntimeConfig(useVulkan = true))` means "allow a supported GPU backend", not "force Vulkan".
Troubleshooting
- If you see "Vulkan 1.2 required" or linker errors for Vulkan symbols, confirm
minSdkis set to 30 or higher inllmedge/build.gradle.ktsand that your NDK provides the expected Vulkan headers. - If experimental OpenCL is not available, or if a GPU backend fails to initialize or execute, llmedge falls back to Vulkan or CPU automatically. For text, Whisper, and image/video, a failing backend is blacklisted per subsystem for the rest of the process and the next backend is retried once.
- If your device lacks both usable OpenCL and Vulkan support, the native code falls back to the CPU backend.
- Uses
com.tom-roush:pdfbox-androidfor PDF parsing. - Embeddings library:
io.gitlab.shubham0204:sentence-embeddings:v6. - Scanned PDFs require OCR (e.g., ML Kit or Tesseract) before indexing.
- ONNX
token_type_idserrors are automatically handled; override viaEmbeddingConfigif required.
The Kotlin side is now organized around a few explicit layers instead of one eager facade:
LLMEdgeis a thin convenience shell that lazy-creates domain clients (text,speech,image,vision,rag) on first access.ModelRepositoryowns model acquisition and validation for local files and Hugging Face downloads.RuntimePoolandRuntimeCoordinatorprovide shared runtime caching, backend selection, and failure blacklisting.RuntimePoolProfilelets each domain describe cache sizing, keying, loading, and backend policy without duplicating pool boilerplate.TextClient,SpeechClient,ImageClient,VisionClient, andRAGClientremain independently constructible for advanced use, butLLMEdgeis the canonical public entrypoint.ConversationSessionSupportcentralizes transcript state and runtime access for chat sessions and tool agents.VisionInputPreparerandVisionRuntimeExecutorsplit image preprocessing/embedding from generation execution.RAGIndexer,RAGRetriever, andRAGAnswererseparate document ingestion, retrieval, and answer generation.- Native libraries remain in the same Android module, but native loading is now explicit and overridable for JVM tests instead of relying on static side effects.
On the native side, the project still builds llama.cpp, stable-diffusion.cpp, whisper.cpp, bark.cpp, and the JNI bridge sources through the Android NDK.
- llama.cpp — Core LLM backend
- stable-diffusion.cpp — Image/video generation backend
- whisper.cpp — Speech-to-text backend
- bark.cpp — Text-to-speech backend
- GGUF / GGML — Model formats
- Android NDK / JNI — Native bindings
- ONNX Runtime — Sentence embeddings
- Android DownloadManager — Large file downloads
You can measure RAM usage at runtime:
val snapshot = MemoryMetrics.snapshot(context)
Log.d("Memory", snapshot.toPretty(context))Typical measurement points:
- Before model load
- After model load
- After blocking prompt
- After streaming prompt
totalPssKb: Total proportional RAM usage. Best for overall tracking.dalvikPssKb: JVM-managed heap and runtime.nativePssKb: Native heap (llama.cpp, ONNX, tensors, KV cache).otherPssKb: Miscellaneous memory.
Monitor nativePssKb closely during model loading and inference to understand LLM memory footprint.
Expert runtimes such as SmolLM also expose native/state-specific memory estimates when you need lower-level instrumentation.
VULKAN_SDKmay still be required when you are building the Vulkan path on the host.- Check Android GPU capability with the explicit per-subsystem helpers such as
LLMEdge.getTextBackendAvailability()andLLMEdge.getImageBackendAvailability().
The library includes consumer ProGuard rules. If you need to add custom rules:
# Keep OCR engines
-keep class io.aatricks.llmedge.vision.** { *; }
-keep class org.bytedeco.** { *; }
-keep class com.google.mlkit.** { *; }
# Suppress warnings for optional dependencies
-dontwarn org.bytedeco.**
-dontwarn com.google.mlkit.**
- llmedge: Apache 2.0
- llama.cpp: MIT
- stable-diffusion.cpp: MIT
- whisper.cpp: MIT
- bark.cpp: MIT
- Leptonica: Custom (BSD-like)
- Google ML Kit: Proprietary (see ML Kit terms)
- JavaCPP: Apache 2.0
This project builds upon work by Shubham Panchal, ggerganov, and PABannier. See CREDITS.md for full details.
Looking to run unit and instrumentation tests locally, including optional native txt2img E2E checks? See the step-by-step guide in docs/testing.md.