One of the core design goals Georgi Gerganov had with GGUF was to not need other files. It's literally bullet point #1 in the specs
>Single-file deployment
>Full information: all information needed to load a model is contained in the model file, and no additional information needs to be provided by the user.
https://github.com/ggml-org/ggml/blob/master/docs/gguf.md
We literally just got rid of that multi file chaos only for ollama to add it back :/
Most of the parameters you would include in ollama's ModelFile are things you would pass to llama.cpp using command line flags:
https://github.com/ggml-org/llama.cpp/blob/master/examples/m...
If you only ever have one set of configuration parameters per model (same temp, top_p, system prompt...), then I guess you can put them in a gguf file (as the format is extensible).
But what if you want two different sets? You still need to keep them somewhere. That could be a shell script for llama.cpp, or a ModelFile for ollama.
(Assuming you don't want to create a new (massive) gguf file for each permutation of parameters.)
This is why we use xdelta3, rdiff, and git
If you ollama pull <model> the modelfile will be downloaded along with the blob. To modify the model permanently, you can copypasta the modelfile into a text editor and then create a new model from the old modelfile with the changes you require/made.
Here is my workflow when using Open WebUI:
1. ollama show qwen3:30b-a3b-q8_0 --modelfile
2. Paste the contents of the modelfile into -> admin -> models -> OpenwebUI and rename qwen3:30b-a3b-q8_0-monkversion-1
3. Change parameters like num_gpu 90 to change layers... etc.
4. Keep | Delete old file
Pay attention to the modelfile, it will show you something like this: # To build a new Modelfile based on this, replace FROM with:
# FROM qwen3:30b-a3b-q8_0 and you need to make sure the paths are correct. I store my models on a large nvme drive that isn't default ollama as an example of why that matters.
EDIT TO ADD:
The 'modelfile' workflow is a pain in the booty. It's a dogwater pattern and I hate it. Some of these models are 30 to 60GB and copying the entire thing to change one parameter is just dumb.
However, ollama does a lot of things right and it makes it easy to get up and running. VLLM, SGLang, Mistral.rs and even llama.cpp require a lot more work to setup.
Sorry, I should have been clearer.
I meant when you download a gguf file from huggingface, instead of using a model from ollama's library.
ollama pull hf.co/unsloth/Qwen3-30B-A3B-GGUF:Q4_K_M and the modelfile comes with it. It may have errors in the template or parameters this way. It has to be converted to GGUF/GGML prior to using it this way. You can, of course, convert and create the specific ollama model from bf16 safetensors as well.
Yeah when I do this, the modelfile has only FROM and TEMPLATE. No PARAMETERs:
ollama pull hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
ollama show --modelfile hf.co/jedisct1/MiMo-7B-RL-GGUF:Q4_K_M
Pretty sure the whole reason Ollama uses raw hashes everywhere is to avoid copying the whole NN gigabytes every time.
Maybe I am doing something wrong! When I change parameters on the modelfile, the whole thing is copied. You can't just edit the file as far as I know, you have to create another 38GB monster to change num_ctx to a reasonable number.
The parameters (prompt, etc.) should be set only in the new modelfile (passed to `ollama create`), using a FROM referencing the previous ollama model. Parameters in a Modelfile override the hard-coded parameters from the GGUF itself (which are sometimes buggy); in fact from elsewhere in the thread it sounds like Mimo is missing proper stop tokens, or maybe templates in general; I'm not an expert).
This will show a separate entry in `ollama list` but only copy the Modelfile not the GGUF.
Alternatively, if you use the API, you can override parameters "temporarily". Some UIs let you do this easily, at least for common parameters.
I’ll typically use the defaults initially and then use a Modelfile if it’s something I plan on using. I think you can dump the modelfile ollama uses to have a template to work with.