Skip to content

Commit

Permalink
Update index.html
Browse files Browse the repository at this point in the history
  • Loading branch information
NamrataRShivagunde committed May 8, 2024
1 parent dec54de commit d3e8426
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions docs/2024/pept_relora_n_galore/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -147,8 +147,7 @@ <h1>Parameter Efficient Pre-Training: Comparing ReLoRA and GaLore</h1>


<h2 id="intro">Parameter Efficient Pre-training (PEPT)</h2>
<p>As the size and complexity of large language models (LLMs) continue to grow, so does the demand for computational resources to train them. With billions of parameters, training these models becomes increasingly challenging due to the high cost and resource constraints involved.</p>
<p>aParameter-efficient fine-tuning (PEFT) methods have addressed these problems by reducing the resources needed to fine-tune LLMs for specific tasks. This raises the question: can we use parameter-efficient training methods and achieve similar efficiency gains during the pre-training stage too?</p>
<p>As the size and complexity of large language models (LLMs) continue to grow, so does the demand for computational resources to train them. With billions of parameters, training these models becomes increasingly challenging due to the high cost and resource constraints. In response to these challenges, parameter-efficient fine-tuning (PEFT) methods have emerged to fine-tune billion-scale LLMs, for specific tasks, on a single GPU. This raises the question: can we use parameter-efficient training methods and achieve similar efficiency gains during the pre-training stage too?</p>
<p>Parameter-efficient pre-training (PEPT) is an emerging area of research that explores techniques for pre-training LLMs with fewer parameters. PEPT has the potential to significantly reduce the computational cost associated with pre-training large language models. Multiple studies suggest that neural network training is either low-rank or has multiple phrases with initially high-rank and subsequent low-rank training (Aghajanyan et al., 2021, Arora et al., 2019, Frankle et al., 2019).</p>
<p>ReLoRA (Lialin et. al, 2023) is the first parameter-efficient training method used to pre-train large language models. ReLoRA uses LoRA decomposition, merges and resets the values of the LoRA matrices multiple times during training, increasing the total rank of the update. Another recent advance in PEPT is GaLore (Zhao et. al, 2024). In GaLore, the gradient is projected into its lower rank form, updated using an optimizer, and projected back to its original shape, reducing the memory requirement for pre-training LLMs by a huge margin.</p>
<p>This blog discusses ReLoRA and GaLore, explaining their core concepts, and key differences.
Expand Down

0 comments on commit d3e8426

Please sign in to comment.