A LLaMA that spits out posts: Our test of Meta's AI
LLaMA (Large Language Model Meta AI) is the artificial intelligence developed by Meta. In theory, to use it, you need to fill out Meta's form and patiently wait for Zuckerberg's team to accept you into their club. But on March 11th, 2023, an unofficial webpage with download links appeared on the web. By combining these links with an Open Source tool, it's possible to install this AI on your computer.
For those who expect a similar experience to ChatGPT, let's be clear: you're far from it. Indeed, the OpenAI team has particularly worked on the conversational aspect of their AI, so that the exchanges with ChatGPT are as smooth as possible. Even if it means making the user believe that they are exchanging with another human being.
Meta's AI, on the other hand, seeks to predict the next words that come after our prompt. Its usage is therefore quite surprising. Especially when reading its responses, one quickly understands that the dataset on which this AI has been trained comes from Facebook and Instagram posts (and maybe even WhatsApp messages):
This artificial intelligence created by Meta raises many ethical questions:
- Are the posts generated by the AI exactly the same as those published by their authors? If so, are they public posts? It's difficult to tell at first glance: more testing is needed.
- What rules does Meta have in place to control its AI? As we saw with ChatGPT and the Dan mode: AI users are very clever at removing barriers from their creators. But unlike ChatGPT, Meta's AI cannot be forcibly updated, as it was found on the web without control (more information below) before being downloaded onto its users' personal machines.
- Finally, is it a tool that could be used for spam at a lower cost?
Despite these issues, it is hoped that LLaMa will allow many people to understand how energy-intensive AIs are and how much hardware resources they require.
It all starts with...
It all started with a request to modify a file in Meta's AI project.
This modification proposed by a contributor contains torrent download links. A very effective way to share files.
Recap on torrents
This principle has been used for a long time to download content (games, movies, etc.) illegally. Torrents allow creating download networks. At first, one person shares a file. Another person retrieves it and then shares it as well. The more users who have the file, the faster it can be downloaded by other internet users. Because the download software is able to retrieve small pieces of files from everywhere.
Torrent size / IPFS
For the more cautious, the contributor proposes an alternative: downloading files in Web3 mode with IPFS (a prettier variant of torrents from a marketing point of view). The contributor who published the download links proposes this solution to speed up the model download. Indeed, the (hacked) files of the Meta AI weigh 219GB.
If you want to test this AI but don't have fiber, you may have to be patient... because if the files don't have good bandwidth, or if the server is slow, it can take a very long time.
There are 4 directories (7B, 13B, 30B, 65B) corresponding to 4 different AI models, classified from least powerful to most powerful: from 7 billion to 65 billion parameters.
You will notice that the weight of the corresponding folders is also increasing: from 13GB to 122GB for the heaviest model. These are raw files: additional operations are required afterwards to use them.
|Name||Weight||Required RAM||Examples of graphics card||RAM / Swap to load|
|LLaMA - 7B||3.5GB||6GB||RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060||16GB|
|LLaMA - 13B||6.5GB||10GB||AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000||32GB|
|LLaMA - 30B||15.8GB||20GB||RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100||64GB|
|LLaMA - 65B||31.2GB||40GB||A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada||128GB|
This table, from a guide on the web (a website which has disappeared since, but nonetheless archived), indicates the minimum amount of required memory, with examples of graphics cards powerful enough to handle Meta models. Once processed, the weight of the models becomes lighter. In the case of model 7B, it weighs 13GB when downloaded - but after processing, its weight drops to 3.5GB.
After the leak of Meta's models, web guides and GitHub projects appeared to make it easy for developers to test them.
Before installing, know that you can use a web version reworked by researchers and based on Meta's AI.
⚠️ Warning: the project launches a lot of raw command lines, installs and executes many programs. We are not responsible for the uses you will make of it, nor for the impacts on your machines.
On our side, we chose to use it to test the AI.
Prerequisite: having installed NodeJS
Just type the command (no, it's not a joke):
npx dalai llama
The command downloads the lightest model (7B), then processes it with different programs so that your computer can use it with its own processor or graphics card. We performed this test with a Macbook Pro M1.
After installation, simply launch the command:
npx dalai serve
A web interface launches and you're done. The page looks like the test version of ChatGPT. Different parameters can be adjusted, and in the background, a command is launched. The responses are displayed on the screen as they come in.
models: '13B', '7B' ,
prompt: 'write a poem'
exec: ./main --seed -1 --threads 4 --n_predict 1000 --model models/7B/ggml-model-q4_0.bin --top_k 40 --top_p 0.9 --temp 0.8 --repeat_last_n 64 --repeat_penalty 1.3 -p "ecrit un poeme en français" in /Users/jeremy.pastouret/llama.cpp
We were also able to use the 13B version, which has 13 billion manageable parameters for the AI. With this model, we noticed an increasing fan noise and a computer that starts to heat up.
With this AI, it becomes possible to more concretely envision the energy and material cost of such a tool. In a future article, we will share our tests regarding its electrical, material, and environmental cost.