What is the purpose of the MiniMax-M2.1.imatrix.gguf file?

#1
by tarruda - opened

Sometimes I see this file in GGUF repos. Can you explain what is the purpose?

Sometimes I see this file in GGUF repos. Can you explain what is the purpose?

@tarruda This file contains the importance matrix we computed for this specific model and what we used to generate the weighted/imatrix quants. We computed it by running a dataset with almost all possible use cases on an LLM through the model and measured what parts of the model are the most important so we can quantize them with higher precision compared to less important parts. That way our weighted/imatrix quants exceed our static quants in terms of quality at the same size and so delivering higher quality at the same performance and hardware requirement to our users. Computing such an importance matrix is quite resource intensive. It took around half a day and 512 GiB of RAM for MiniMax-M2.1. We provide the importance matrix file to our users so they can use it when creating their own MiniMax-M2.1 quants. While we cover all popular precision mixtures there are some advanced users that go one step further and create their own precision mixtures to create quants that better fit their hardware, use-case or a specific model architecture who then can use our importance matrix. In short unless you plan on creating your own MiniMax-M2.1 quants you don’t need that file.

imatrix files essentially assign weights to the different parts of the LLMs graph. These are typically based on how important they were in determining the output of a set of test prompts.

The imatrix file is then used during quantization to determine which parts to quantize at which bitsize - with nodes that were more important for the output of the test prompts being given more bits than those deemed less important.

Thanks for the info!

Are you planning to release IQ4 quants? I'm able to fit Q4_K_S on my 128GB Mac Studio, but IQ4 should leave more room for context.

Are you planning to release IQ4 quants? I'm able to fit Q4_K_S on my 128GB Mac Studio, but IQ4 should leave more room for context.

Yes we will provide all ouer usualy quants including i1-IQ4_XS. It just takes longer then usual for all of them to be computed and uploaded due to the size of the model. You can always check the current status of ouer workers under https://hf.tst.eu/status.html

Looking forward to it. Thanks for your awesome quants!

tarruda changed discussion status to closed

Sign up or log in to comment