Stability AI quickly released the Llama 2 fine-tuning model FreeWilly, whose performance is comparable to ChatGPT! Netizens exclaimed that the rules of the game have changed

Source: Xinzhiyuan

Not two days after the release of Llama 2, the unicorn Stability AI has quickly fine-tuned the FreeWilly model, which is said to be comparable in performance to ChatGPT.

As soon as Meta's Llama 2 was released, it detonated the entire open source community.

As OpenAI scientist Karpathy said, this is an extremely important day for the entire field of large language models. Of all the models with open weights, Llama 2 is the most powerful one.

From then on, the gap between open-source big models and closed-source big models will be further narrowed, and the opportunity to build big models will be equal to all developers.

Just now, Stability AI and CarperAI Labs jointly released a fine-tuning model based on the LLaMA 2 70B model - FreeWilly2.

And, based on the fine-tuning of the original model of LLaMA 65B - FreeWilly1.

It is worth noting that the model is trained on a new synthetic dataset based on the standard Alpaca format and undergone supervised fine-tuning (SFT).

In various benchmark tests, FreeWilly2 has demonstrated excellent reasoning capabilities, and even surpassed GPT-3.5 in some tasks.

Model address:

Model address:

Both models are research experiments and released under a non-commercial license.

Data generation and collection

Stability AI said that the training of the FreeWilly model was directly inspired by the Microsoft paper "Orca: Progressive Learning from Complex Explanation Traces of GPT-4".

However, while the data generation process is similar, the sources are different.

Paper link:

The dataset variant of FreeWilly contains 600,000 data points (roughly 10% of the dataset size used in the original Orca paper), and the model is bootstrapped by using a high-quality instruction dataset created by Enrico Shippole:

  • COT Submix Original

  • NIV2 Submix Original

  • FLAN 2021 Submix Original

  • T0 Submix Original

With this approach, Stability AI generated 500,000 examples using a simpler LLM model, and an additional 100,000 examples using a more complex LLM model.

For a fair comparison, Stability AI carefully screened these datasets and removed examples derived from the evaluation benchmark.

Although the training sample size is only one-tenth of the original Orca paper, the resulting FreeWilly model not only performs well in various benchmark tests, but also verifies the feasibility of the method of synthetically generating datasets.

Evaluation of model performance

In terms of performance evaluation, Stability AI researchers adopted EleutherAI's lm--harness and added AGI.

Judging from the results, FreeWilly excels in many areas, including complex reasoning, understanding the subtleties of language, and answering complex questions related to professional domains (such as legal and mathematical problem solving).

Basically, FreeWilly 2 has achieved a level comparable to ChatGPT, and even surpassed it in some evaluations.

GPT4ALL benchmark (0-shot):

AGI evaluation (0-shot):

In addition, the team from Hugging Face also independently reproduced the experiment on July 21.

It can be seen that in the Open LLM leaderboard, FreeWilly 2 ranks first with an absolute lead, and the average score is 4 percentage points higher than that of the original version of Llama 2.

For an open future

It can be said that FreeWilly1 and FreeWilly2 set a new standard for open source large language models.

The introduction of these two models has not only greatly advanced the research in related fields, enhanced the ability of natural language understanding, but also supported the completion of complex tasks.

Stability AI said that the team is very excited about the infinite possibilities that these models can bring to the AI community, and looks forward to the new applications that they will inspire.

In addition, a heartfelt thank you to the passionate team of researchers, engineers, and partners whose extraordinary efforts and dedication have enabled Stability AI to reach this important milestone.

EXCITING TIME

Once the model was released, netizen "Phil Howes" used Tuhin Srivastava's Llama v2 framework to complete the implementation of FreeWilly 2 in less than a minute.

After 275GB of weight loading, the model runs at 23 token/s out of the box.

In addition, some netizens exclaimed: The model jointly launched by Stability AI and CarperAI can be called a game changer!

FreeWilly1 and FreeWilly2 have great innovative significance in terms of open source and performance, and the AI circle is ushering in an exciting moment.

References:

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments