Introducing Zephyr 7B, a new large language model fine tuned on Mistral

Samuel - AI4Chat
Author Samuel

Category

Blog Content

Updated on

2023-11-16
Introducing Zephyr 7B, a new large language model fine tuned on Mistral

```

Introduction

We are thrilled to announce the release of a superior language model - Zephyr 7B Beta. Harnessing the efficiency of Mistral finetuning, Zephyr 7B Beta surpasses the performance of its counterparts on multiple benchmarks and persists as a commendable model of its size. The enchanting magic of ultra-scale feedback dataset and the significant use of Direct Preference Optimization (DPO) create Zephyr's splendid performance.

A New Rival Emerges

In the grandiosely challenging arena of large language models, the score of Zephyr Beta on MT-Bench outdid Llama 2 Chat 70B. The triumphant progression continued on to AlpacaEval, where despite a close competition, Zephyr proved to be a tough contender.

Intrinsic Training Methodology

The essence of Zephyr is not solely deemed from its high metrics but also the unique methodology of training it undergoes. Incorporated in the model is the brilliant performance of Mistral 7B, a powerful fine-tuned pretraining structure, augmented with the colossal preferences dataset, and a shift from RL to DPO. A surprise element arises when the model manifests better chat results with overfitting on the preference dataset.

Manifestation of Intensive Stages

The expedition to excellence unfurls through three comprehensive training stages:

  1. Distilled Supervised Fine-Tuning (dSFT): It grounds the embedding of a vast scale self-instruct-style dataset and follows it by distilled SFT.
  2. AI Feedback (AIF): A series of four different Large Language Models (LLMs) generate a diverse selection of completions. Subsequently, the prestigious GPT-4 aids in ranking the responses.
  3. Distilled Direct Preference Optimization (dDPO): It eliminates the need for a reward model by applying DPO to the dSFT model using the feedback data. Interestingly, Zephyr, when offered more DPO epochs than its alpha variant, outshines with better chat results.

Breaking Down the Insights

While the generally accepted idea contradicts, overfitting with DPO enhances the chat model performance on all benchmarks. To ascertain if SFT and DPO carry consequential significance, ablation experiments were brought into use. The experiments revealed the lack of chat template learning in models with DPO alone. However, the liaison of SFT and DPO orchestrated the best results. Any irregular bordering and incorrect casing were rectified through additional filtering.

Conclusion

The genesis of Zephyr 7B Beta marks a promising future for language processing tasks. The ability to exceed benchmarks, use unconventional fitting, and provide high-quality output rightly signifies the progress in the AI industry. We encourage everyone interested in learning more about the Zephyr 7B Beta and its prowess to reach out or visit the provided links. Stay tuned for more advancements in language models!

```

Related Posts

All set to level up
your content game?

Get Started Now
cta-area