-
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
Paper review • #RL #alignment #reward weight
-
Reinforcement Pre-Training
Paper review • #RL #pre-training
-
SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization
Paper review • #OOD #multimodal learning
-
ACL conference note
ACL'24 paper list and repo