awesome-LLM-Quantization

Collect LLM Quantization related papers, data, repositories

Papers

2024

[ICML] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [code]
[ICML] QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks [code]
[ICML] FrameQuant: Flexible Low-Bit Quantization for Transformers
[ICML] SqueezeLLM: Dense-and-Sparse Quantization [code]
[ICML] Extreme Compression of Large Language Models via Additive Quantization [code]
[ICML] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache [code]
[ICML] BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
[ICML] Compressing Large Language Models by Joint Sparsification and Quantization [code]
[ICML] LQER: Low-Rank Quantization Error Reconstruction for LLMs [code]
[ICML] Accurate LoRA-Finetuning Quantization of LLMs via Information Retention [code]
[ACL] DB-LLM: Accurate Dual-Binarization for Efficient LLMs
[ACL] Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models [code]
[ACL] LRQuant: Learnable and Robust Post-Training Quantization for Large Language Models [code]
[ACL] Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression [code]
[ACL] BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation [code]
[ACL] Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
[NeurIPS] LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment
[NeurIPS] LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models [code]
[NeurIPS] FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
[ICLR] OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models [code]
[ICLR] LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models [code]
[ICLR] Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models [code]
[ICLR] AffineQuant: Affine Transformation Quantization for Large Language Models [code]
[ICLR] SpQR: ASparse-Quantized Representation for Near-Lossless LLM Weight Compression [code]
[ICLR] QLLM: ACCURATE AND EFFICIENT LOW-BITWIDTH QUANTIZATION FOR LARGE LANGUAGE MODELS [code]
[ICLR] COMPRESSING LLMS: THE TRUTH IS RARELY PURE AND NEVER SIMPLE [code]
[ICLR] LQ-LORA: LOW-RANK PLUS QUANTIZED MATRIX DECOMPOSITION FOR EFFICIENT LANGUAGE MODEL FINETUNING [code]
[ICLR] QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models [code]
[NAACL] ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models
[NAACL] Divergent Token Metrics: Measuring degradation to prune away LLM components – and optimize quantization
[NAACL] Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

2023

[ICML] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [code]
[EMNLP] Outlier Suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling [code]
[EMNLP] EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
[EMNLP] Zero-shot Sharpness-Aware Quantization for Pre-trained Language Models
[EMNLP] A Frustratingly Easy Post-Training Quantization Scheme for LLMs [code]
[EMNLP] Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
[ACL] Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
[NeurIPS] Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
[NeurIPS] QuIP: 2-Bit Quantization of Large Language Models With Guarantees [code]
[NeurIPS] Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing [code]
[NeurIPS] QLoRA: Efficient Finetuning of Quantized LLMs [code]
[NeurIPS] Training Transformers with 4-bit Integers
[NeurIPS] TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration
[ACL] PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models
[ICLR] OPTQ: Accurate Quantization for Generative Pre-trained Transformers [code]
[ICLR] FIT: A Metric for Model Sensitivity
[ICLR] PowerQuant: Automorphism Search for Non-Uniform Quantization
[ICLR] Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats

2022

[NeurIPS] ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers [code]
[NeurIPS] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scal [code]
[NeurIPS] Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models [code]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

awesome-LLM-Quantization

Papers

2024

2023

2022

About

Releases

Packages

Contributors 4

ZHITENGLI/awesome-LLM-Quantization

Folders and files

Latest commit

History

Repository files navigation

awesome-LLM-Quantization

Papers

2024

2023

2022

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Packages