Microsoft’s AI team has developed a new method called RAFT (Reinforcement learning with Augmented Feedback for Training) to improve the performance of Language Models (LMs). RAFT is designed to provide better training to LMs by giving them more nuanced feedback, rather than the traditional binary approach of right or wrong. This method helps LMs to understand the degree of correctness of their responses, enabling more accurate and contextually relevant answers.
RAFT utilises a reward model, trained to predict human ratings on the correctness of LM responses. It uses Proximal Policy Optimisation, an algorithm that balances exploration and exploitation, to train LMs. This approach allows LMs to learn from their mistakes and adapt their responses accordingly.
The AI team tested RAFT on GPT-3, one of the largest LMs, and found noticeable improvements in its performance. RAFT-trained GPT-3 provided more accurate and contextually appropriate responses compared to the standard-trained model.
Microsoft’s AI team believes that RAFT is a step towards creating more sophisticated AI, capable of understanding and responding to complex tasks. They are optimistic about the potential of this method to improve the performance of future LMs. Despite the promising results, they acknowledge the challenges and complexities involved in fine-tuning the reward model and remain committed to addressing these issues.
Go to source article: https://techcommunity.microsoft.com/t5/ai-ai-platform-blog/raft-a-new-way-to-teach-llms-to-be-better-at-rag/ba-p/4084674