Large Language Model Evaluation via Matrix Entropy

Lai Wei *, Zhiquan Tan *, Chenghai Li, Jindong Wang, Weiran Huang (*Equal Contribution).

Shanghai Jiao Tong University & Tsinghua University & Microsoft Research Asia

Introduction

We introduce matrix entropy, a novel metric rooted in information theory and geometry principles to quantify the data compression proficiency in LLMs. It reflects the model's ability to extract relevant information and eliminate unnecessary elements, thereby providing insight into the language model's intrinsic capability. Specifically, we demonstrate its applicability in both single-modal (language) and multi-modal settings. For language models, our findings reveal that the matrix entropy of representations follows a scaling law type reduction when the model scales up, serving as a complement to the traditional loss scaling law. For multi-modal models, we also propose an evaluation method based on matrix entropy for assessing alignment quality and we find that modern multi-modal large language models exhibit good alignment performance.

Calculation of Matrix Entropy

from transformers import AutoTokenizer, AutoModel
import torch
import math

# R input N*d
def normalize(R):
    with torch.no_grad():
        mean = R.mean(dim=0)
        R = R - mean
        norms = torch.norm(R, p=2, dim=1, keepdim=True)
        R = R/norms
    return R

def cal_cov(R):
    with torch.no_grad():
        Z = torch.nn.functional.normalize(R, dim=1)
        A = torch.matmul(Z.T, Z)/Z.shape[0]
    return A

def cal_entropy(A):
    with torch.no_grad():
        eig_val = torch.svd(A / torch.trace(A))[1] 
        entropy = - (eig_val * torch.log(eig_val)).nansum().item()
        normalized_entropy = entropy/math.log(A.shape[0])
    return normalized_entropy

model_path = "cerebras/Cerebras-GPT-1.3B" # for example
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModel.from_pretrained(model_path, device_map="auto").cuda()

text = "I love Generative AI very much." # for example
inputs = tokenizer(text, return_tensors="pt").to('cuda')
with torch.no_grad():
    R = model(inputs.input_ids)[0][0, :, :]
    R = normalize(R)
    A = cal_cov(R)
    Entropy = cal_entropy(A)
print(Entropy)

Matrix Entropy of Single Sentence

cd utils

python entropy_single_sentence.py

Matrix Entropy of Dataset

Please download the datasets of wiki-en, dolly-15k, openwebtext2, hh-rlhf in huggingface and edit the data path in your scripts.

cd utils

python entropy_dataset.py

Citation

If you're using Matrix Entropy in your research or applications, please cite using this BibTeX:

@article{wei2024large,
  title={Large Language Model Evaluation via Matrix Entropy},
  author={Wei, Lai and Tan, Zhiquan and Li, Chenghai and Wang, Jindong and Huang, Weiran},
  journal={arXiv preprint arXiv:2401.17139},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
utils		utils
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Large Language Model Evaluation via Matrix Entropy

Introduction

Calculation of Matrix Entropy

Matrix Entropy of Single Sentence

Matrix Entropy of Dataset

Citation

About

Releases

Packages

Languages

License

waltonfuture/Matrix-Entropy

Folders and files

Latest commit

History

Repository files navigation

Large Language Model Evaluation via Matrix Entropy

Introduction

Calculation of Matrix Entropy

Matrix Entropy of Single Sentence

Matrix Entropy of Dataset

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages