dpo

Here are 50 public repositories matching this topic...

shibing624 / MedicalGPT

MedicalGPT: Training Your Own Medical GPT Model with ChatGPT Training Pipeline. 训练医疗大模型，实现了包括增量预训练(PT)、有监督微调(SFT)、RLHF、DPO、ORPO。

medical llama gpt dpo llm chatgpt medicalgpt

Updated Jun 28, 2024
Python

modelscope / swift

Star

ms-swift: Use PEFT or Full-parameter to finetune 250+ LLMs or 35+ MLLMs. (Qwen2, GLM4, Internlm2, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

agent deploy llama lora gemma peft multimodal sft dpo pre-training awq llm modelscope llava qwen qwen2 unsloth llama3 glm4 internvl

Updated Jun 28, 2024
Python

ContextualAI / HALOs

Star

A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).

alignment ppo halos dpo kto rlhf

Updated May 30, 2024
Python

jianzhnie / LLamaTuner

Star

Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.

llama ppo dpo chatgpt rlhf qlora qwen mixtral llama3

Updated Jun 14, 2024
Python

ukairia777 / tensorflow-nlp-tutorial

Star

tensorflow를 사용하여 텍스트 전처리부터, Topic Models, BERT, GPT, LLM과 같은 최신 모델의 다운스트림 태스크들을 정리한 Deep Learning NLP 저장소입니다.

nlp natural-language-processing tensorflow transformers named-entity-recognition question-answering llama lora trainer bert keras-tutorial sft dpo nlp-tutorial huggingface bert-ner llm

Updated Feb 22, 2024
Jupyter Notebook

TUDB-Labs / mLoRA

Star

An Efficient "Factory" to Build Multiple LoRA Adapters

gpu llama lora finetune peft dpo baichuan llm rlhf chatglm llama2 mlora

Updated Jun 28, 2024
Python

armbues / SiLLM

Star

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Jun 28, 2024
Python

argilla-io / notus

Star

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach

zephyr fine-tuning dpo trl lm-alignment preference-data alignment-handbook

Updated Jan 15, 2024
Python

anilca / NetTrader.Indicator

Star

Technical anaysis library for .NET

cmf momentum atr roc envelope adx cci sar cmo ema rsi adl obv pvt trix macd bollinger-bands ichimoku-cloud dpo dema

Updated Mar 13, 2023
C#

RockeyCoss / SPO

Star

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

text-to-image dpo diffusion-models text-to-image-generation sdxl

Updated Jun 21, 2024
Python

TideDra / VL-RLHF

Star

A RLHF Infrastructure for Vision-Language Models

vlm lmm dpo llm rlhf mllm

Updated Jun 12, 2024
Python

martin-wey / CodeUltraFeedback

Star

CodeUltraFeedback: aligning large language models to coding preferences

alignment code-generation dpo large-language-models llm-as-a-judge codeultrafeedback codal-bench

Updated Jun 25, 2024
Python

dvlab-research / Step-DPO

Star

Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"

math reasoning dpo llm

Updated Jun 28, 2024
Python

RobinSmits / Dutch-LLMs

Star

Various training, inference and validation code and results related to Open LLM's that were pretrained (full or partially) on the Dutch language.

transformers pytorch alpaca peft dpo trl large-language-models open-llama polylm qwen2

Updated Apr 9, 2024
Jupyter Notebook

sugarandgugu / Simple-Trl-Training

Star

基于DPO算法微调语言大模型，简单好上手。

simple dpo trl llm rlhf

Updated Apr 16, 2024
Python

Zepson-Tech / dpo-laravel

Star

A Laravel package to simplify using DPO Payment API in your application. https://dpogroup.com

php laravel hacktoberfest dpo dpogroup directpay

Updated Sep 8, 2023
PHP

karhel / glpi-dporegister

Star

Processings Register for DPO (GDPR) - GLPI Plugin

plugin pdf register glpi gdpr rgpd personal-data-protection dpo

Updated Jul 17, 2023
PHP

armbues / SiLLM-examples

Star

Examples for using the SiLLM framework for training and running Large Language Models (LLMs) on Apple Silicon

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated May 17, 2024
Python

vicgalle / configurable-safety-tuning

Sponsor

Star

Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"

alignment safety preference-learning dpo llm

Updated Apr 23, 2024
Python

CyberAgentAILab / filtered-dpo

Star

Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by discarding lower-quality samples compared to those generated by the learning model

alignment dpo rlhf

Updated Apr 25, 2024
Python

Improve this page

Add a description, image, and links to the dpo topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dpo topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dpo

Here are 50 public repositories matching this topic...

shibing624 / MedicalGPT

modelscope / swift

ContextualAI / HALOs

jianzhnie / LLamaTuner

ukairia777 / tensorflow-nlp-tutorial

TUDB-Labs / mLoRA

armbues / SiLLM

argilla-io / notus

anilca / NetTrader.Indicator

RockeyCoss / SPO

TideDra / VL-RLHF

martin-wey / CodeUltraFeedback

dvlab-research / Step-DPO

RobinSmits / Dutch-LLMs

sugarandgugu / Simple-Trl-Training

Zepson-Tech / dpo-laravel

karhel / glpi-dporegister

armbues / SiLLM-examples

vicgalle / configurable-safety-tuning

CyberAgentAILab / filtered-dpo

Improve this page

Add this topic to your repo