Skip to content

Collect LLM Quantization related papers, data, repositories

Notifications You must be signed in to change notification settings

ZHITENGLI/awesome-LLM-Quantization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 

Repository files navigation

awesome-LLM-Quantization

Collect LLM Quantization related papers, data, repositories

Papers

2024

  • [ICML] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [code]GitHub Repo stars
  • [ICML] QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks [code]GitHub Repo stars
  • [ICML] FrameQuant: Flexible Low-Bit Quantization for Transformers
  • [ICML] SqueezeLLM: Dense-and-Sparse Quantization [code]GitHub Repo stars
  • [ICML] Extreme Compression of Large Language Models via Additive Quantization [code]GitHub Repo stars
  • [ICML] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache [code]GitHub Repo stars
  • [ICML] BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
  • [ICML] Compressing Large Language Models by Joint Sparsification and Quantization [code]GitHub Repo stars
  • [ICML] LQER: Low-Rank Quantization Error Reconstruction for LLMs [code]GitHub Repo stars
  • [ICML] Accurate LoRA-Finetuning Quantization of LLMs via Information Retention [code]GitHub Repo stars
  • [ACL] DB-LLM: Accurate Dual-Binarization for Efficient LLMs
  • [ACL] Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models [code]GitHub Repo stars
  • [ACL] LRQuant: Learnable and Robust Post-Training Quantization for Large Language Models [code]GitHub Repo stars
  • [ACL] Unlocking Data-free Low-bit Quantization with Matrix Decomposition for KV Cache Compression [code]GitHub Repo stars
  • [ACL] BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation [code]GitHub Repo stars
  • [ACL] Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
  • [NeurIPS] LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment
  • [NeurIPS] LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models [code]GitHub Repo stars
  • [NeurIPS] FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs
  • [ICLR] OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models [code]GitHub Repo stars
  • [ICLR] LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models [code]
  • [ICLR] Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models [code]
  • [ICLR] AffineQuant: Affine Transformation Quantization for Large Language Models [code]
  • [ICLR] SpQR: ASparse-Quantized Representation for Near-Lossless LLM Weight Compression [code]
  • [ICLR] QLLM: ACCURATE AND EFFICIENT LOW-BITWIDTH QUANTIZATION FOR LARGE LANGUAGE MODELS [code]
  • [ICLR] COMPRESSING LLMS: THE TRUTH IS RARELY PURE AND NEVER SIMPLE [code]
  • [ICLR] LQ-LORA: LOW-RANK PLUS QUANTIZED MATRIX DECOMPOSITION FOR EFFICIENT LANGUAGE MODEL FINETUNING [code]
  • [ICLR] QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models [code]
  • [NAACL] ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models
  • [NAACL] Divergent Token Metrics: Measuring degradation to prune away LLM components – and optimize quantization
  • [NAACL] Compensate Quantization Errors: Make Weights Hierarchical to Compensate Each Other

2023

  • [ICML] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models [code]GitHub Repo stars
  • [EMNLP] Outlier Suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling [code]GitHub Repo stars
  • [EMNLP] EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
  • [EMNLP] Zero-shot Sharpness-Aware Quantization for Pre-trained Language Models
  • [EMNLP] A Frustratingly Easy Post-Training Quantization Scheme for LLMs [code]GitHub Repo stars
  • [EMNLP] Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
  • [ACL] Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
  • [NeurIPS] Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
  • [NeurIPS] QuIP: 2-Bit Quantization of Large Language Models With Guarantees [code]GitHub Repo stars
  • [NeurIPS] Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing [code]GitHub Repo stars
  • [NeurIPS] QLoRA: Efficient Finetuning of Quantized LLMs [code]GitHub Repo stars
  • [NeurIPS] Training Transformers with 4-bit Integers
  • [NeurIPS] TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration
  • [ACL] PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models
  • [ICLR] OPTQ: Accurate Quantization for Generative Pre-trained Transformers [code]
  • [ICLR] FIT: A Metric for Model Sensitivity
  • [ICLR] PowerQuant: Automorphism Search for Non-Uniform Quantization
  • [ICLR] Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats

2022

  • [NeurIPS] ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers [code]GitHub Repo stars
  • [NeurIPS] LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scal [code]GitHub Repo stars
  • [NeurIPS] Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models [code]GitHub Repo stars

About

Collect LLM Quantization related papers, data, repositories

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •