Some of the mediapipe models are nice, but mediapipe has been around forever (or 2019). It has always been about running AI on the edge, back when the exciting frontier of AI were visual tasks.
For stuff like face tracking it's still useful, but for some other tasks like image recognition the world has changed drastically
I would say the target audience is anyone deploying ML models cross-platform, specifically ones that would require supporting code beyond the TFLite runtime to make it work.
LLMs and computer vision tasks are good examples of this.
For example, a hand-gesture recognizer might require:
- Pre-processing of input image to certain color space + image size
- Copy of image to GPU memory
- Run of object detection TFLite model to detect hand
- Resize of output image
- Run of gesture recognition TFLite model to detect gesture
- Post processing of gesture output to something useful
Shipping this to iOS+Android requires a lot of code beyond executing TFLite models.
The Google Mediapipe approach is to package this graph pipeline, and shared processing "nodes" into a single C++ library where you can pick and choose what you need and re-use operations across tasks. The library also compiles cross-platform and the supporting tasks can offer GPU acceleration options.
One internal debate Google likely had was whether it was best to extend TFLite runtime with these features, or to build a separate library (Mediapipe). TFLite already supports custom compile options with additional operations.
My guess is they thought it was best to keep TFLite focused on "tensor based computation" tasks and offload broader operations like LLM and image processing into a separate library.