LoRA freezes the pre-trained model weights and injects trainable low-rank decomposition matrices into each layer. This dramatically reduces the number of trainable parameters (often by 10,000x) and memory requirements. QLoRA combines LoRA with quantization for even greater efficiency, enabling fine-tuning of large models on consumer hardware.








