Generate per-session LoRA adapters in <1s for agentic inference efficiency

2 pointsposted 11 hours ago
by Facingsouth

1 Comments

Facingsouth

11 hours ago

Quick Start Generate LoRA Adapters From metadata (JSON string or file):

tessera generate \ --from-metadata '{"task": "classification", "domain": "medical"}' \ --base-model mistralai/Mistral-7B-Instruct-v0.2 \ --rank 16 \ --save ./adapter.safetensors

From text description:

tessera generate \ --from-text "Medical diagnosis assistant" \ --base-model mistralai/Mistral-7B-Instruct-v0.2 \ --rank 16 \ --save ./adapter.safetensors

From document:

tessera generate \ --from-doc ./document.txt \ --base-model mistralai/Mistral-7B-Instruct-v0.2 \ --rank 16 \ --save ./adapter.safetensors

Base Model Management

Download a base model from HuggingFace Hub:

tessera model pull mistralai/Mistral-7B-Instruct-v0.2 tessera model pull meta-llama/Llama-3.1-8B-Instruct tessera model pull deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Start vLLM with a base model:

tessera model serve-model mistralai/Mistral-7B-Instruct-v0.2 --port 8000 tessera model serve-model mistralai/Mistral-7B-Instruct-v0.2 --gpu-memory-utilization 0.9 tessera model serve-model mistralai/Mistral-7B-Instruct-v0.2 --quantization awq

List cached base models:

tessera model list-models

Remove a cached model:

tessera model remove mistralai/Mistral-7B-Instruct-v0.2

Start Tessera Server

Start the hypernetwork server (with auto vLLM):

tessera serve --port 8080 --base-model mistralai/Mistral-7B-Instruct-v0.2

Start the hypernetwork server (standalone):

tessera serve --port 8080 --host 0.0.0.0 Check Server Health tessera health --url http://localhost:8080

List Available Models

tessera list