A LLaMA that spits out posts: Our test of Meta's AI

LLaMA (Large Language Model Meta AI) is the artificial intelligence developed by Meta. In theory, to use it, you need to fill out Meta's form and patiently wait for Zuckerberg's team to accept you into their club. But on March 11th, 2023, an unofficial webpage with download links appeared on the web. By combining these links with an Open Source tool, it's possible to install this AI on your computer.

For those who expect a similar experience to ChatGPT, let's be clear: you're far from it. Indeed, the OpenAI team has particularly worked on the conversational aspect of their AI, so that the exchanges with ChatGPT are as smooth as possible. Even if it means making the user believe that they are exchanging with another human being.

Meta's AI, on the other hand, seeks to predict the next words that come after our prompt. Its usage is therefore quite surprising. Especially when reading its responses, one quickly understands that the dataset on which this AI has been trained comes from Facebook and Instagram posts (and maybe even WhatsApp messages):

This artificial intelligence created by Meta raises many ethical questions:

Are the posts generated by the AI exactly the same as those published by their authors? If so, are they public posts? It's difficult to tell at first glance: more testing is needed.
What rules does Meta have in place to control its AI? As we saw with ChatGPT and the Dan mode: AI users are very clever at removing barriers from their creators. But unlike ChatGPT, Meta's AI cannot be forcibly updated, as it was found on the web without control (more information below) before being downloaded onto its users' personal machines.
Finally, is it a tool that could be used for spam at a lower cost?

Despite these issues, it is hoped that LLaMa will allow many people to understand how energy-intensive AIs are and how much hardware resources they require.

It all starts with...

It all started with a request to modify a file in Meta's AI project.

Hack of LLaMa, Meta's AI whose links are present on the Open Source project

This modification proposed by a contributor contains torrent download links. A very effective way to share files.

Recap on torrents

This principle has been used for a long time to download content (games, movies, etc.) illegally. Torrents allow creating download networks. At first, one person shares a file. Another person retrieves it and then shares it as well. The more users who have the file, the faster it can be downloaded by other internet users. Because the download software is able to retrieve small pieces of files from everywhere.

Torrent size / IPFS

For the more cautious, the contributor proposes an alternative: downloading files in Web3 mode with IPFS (a prettier variant of torrents from a marketing point of view). The contributor who published the download links proposes this solution to speed up the model download. Indeed, the (hacked) files of the Meta AI weigh 219GB.

Download interface of Meta's AI with Web3 technology

If you want to test this AI but don't have fiber, you may have to be patient... because if the files don't have good bandwidth, or if the server is slow, it can take a very long time.

Hack contents

There are 4 directories (7B, 13B, 30B, 65B) corresponding to 4 different AI models, classified from least powerful to most powerful: from 7 billion to 65 billion parameters.

You will notice that the weight of the corresponding folders is also increasing: from 13GB to 122GB for the heaviest model. These are raw files: additional operations are required afterwards to use them.

Name	Weight	Required RAM	Examples of graphics card	RAM / Swap to load
LLaMA - 7B	3.5GB	6GB	RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060	16GB
LLaMA - 13B	6.5GB	10GB	AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000	32GB
LLaMA - 30B	15.8GB	20GB	RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100	64GB
LLaMA - 65B	31.2GB	40GB	A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada	128GB

This table, from a guide on the web (a website which has disappeared since, but nonetheless archived), indicates the minimum amount of required memory, with examples of graphics cards powerful enough to handle Meta models. Once processed, the weight of the models becomes lighter. In the case of model 7B, it weighs 13GB when downloaded - but after processing, its weight drops to 3.5GB.

After the leak of Meta's models, web guides and GitHub projects appeared to make it easy for developers to test them.

AI Installation

Before installing, know that you can use a web version reworked by researchers and based on Meta's AI.

Otherwise, the easiest way is to use a project called dalai, initiated by cocktailpeanut (whom we thank for his practical and easy solution).

⚠️ Warning: the project launches a lot of raw command lines, installs and executes many programs. We are not responsible for the uses you will make of it, nor for the impacts on your machines.

On our side, we chose to use it to test the AI.

Prerequisite: having installed NodeJS

Just type the command (no, it's not a joke):

npx dalai llama

The command downloads the lightest model (7B), then processes it with different programs so that your computer can use it with its own processor or graphics card. We performed this test with a Macbook Pro M1.

After installation, simply launch the command:

npx dalai serve

A web interface launches and you're done. The page looks like the test version of ChatGPT. Different parameters can be adjusted, and in the background, a command is launched. The responses are displayed on the screen as they come in.

query: {
  seed: -1,
  threads: 4,
  n_predict: 1000,
  model: '7B',
  top_k: 40,
  top_p: 0.9,
  temp: 0.8,
  repeat_last_n: 64,
  repeat_penalty: 1.3,
  models:  '13B', '7B' ,
  prompt: 'write a poem'
}
exec: ./main --seed -1 --threads 4 --n_predict 1000 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "ecrit un poeme en français" in /Users/jeremy.pastouret/llama.cpp

We were also able to use the 13B version, which has 13 billion manageable parameters for the AI. With this model, we noticed an increasing fan noise and a computer that starts to heat up.

With this AI, it becomes possible to more concretely envision the energy and material cost of such a tool. In a future article, we will share our tests regarding its electrical, material, and environmental cost.

Support us by sharing the article: