What can we do for the default backend? - Model: Let's start with llama 3.2 1B, preferably quantized ones - Runtime: - [MediaPipe](https://github.com/google-ai-edge/mediapipe-samples/tree/main/examples/llm_inference/android) and [AI-Edge-Torch](https://github.com/google-ai-edge/ai-edge-torch): our current default backend is tflite based, so if we can continue using TFLite-based solution, it would be good. - - new API in LiteRT (https://ai.google.dev/edge/litert) - ExecuTorch: https://github.com/pytorch/executorch, https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md - llama.cpp: https://github.com/ggerganov/llama.cpp - onnx runtime: https://github.com/microsoft/onnxruntime, let's check if it works on Android platforms. - onnx runtime genai: https://github.com/microsoft/onnxruntime-genai
What can we do for the default backend?