Running Mistral Vibe with local models

I experimented with Mistral Vibe and locally hosted models, and I’m sharing practical insights from this setup.

Mistral AI recently released new Devstral models aimed at software development, along with a new product called Vibe, designed for working with code in the context of a project.

After learning about this, I decided to try Vibe — despite my rather skeptical attitude toward this style of development. Especially since my machine has 36 GB of RAM, which made it interesting to see what could realistically be done locally.

I downloaded several model variants from Hugging Face:

mistralai_devstral-small-2-24b-instruct-2512@q8_0 (≈25 GB),
mistralai_devstral-small-2-24b-instruct-2512@q5_k_l (≈17 GB),
mistralai_devstral-small-2-24b-instruct-2512@q5_k_m (≈16 GB).

The downloaded gguf files were placed in a local Models directory.
To run them in LM Studio, I had to create the expected folder structure:
Developer / model name / gguf file.

The next step was configuring ~/.vibe/config.toml, where the provider and model need to be specified:

[[providers]]
name = "llamacpp"
api_base = "http://127.0.0.1:1234/v1"
api_key_env_var = ""
api_style = "openai"
backend = "generic"

[[models]]
name = "mistralai_devstral-small-2-24b-instruct-2512@q5_k_m"
provider = "llamacpp"
alias = "local"
temperature = 0.2
input_price = 0.0
output_price = 0.0

And at the top of the file, the active model must be set:

active_model = "local"

With this configuration, Vibe uses a local model, which in theory allows you to fully leverage your own hardware.

In practice, Vibe successfully connected to the model running in LM Studio and started responding to prompts. However, generation speed was relatively low, especially with the heaviest q8_0 variant. The smaller q5_k_l and q5_k_m models were noticeably faster, but still not as smooth as the cloud-hosted Devstral.

It’s possible that further tuning of LM Studio could improve performance. Having a fully local assistant, independent of internet access, would be extremely valuable.

If anyone has experience optimizing Devstral with LM Studio or Ollama, I’d be happy to hear your recommendations.