AI News
A paper has been released by Microsoft Research, which outlines Differential Transformers, a new method to reduce the noise found in transformer attention blocks. The method makes a simple change to the softmax function by creating two attention heads and subtracting the attention scores from eachother. This, in effect, cancels out noise in the attention scores allowing for improved LLM performance.
The paper outlines how this method can improve contextual understanding in LLMs. It is proposed this method will reduce issues with hallucination and increase in-context learning. The Differential Transformer also requires less memory than traditional transformer. There are currenly no models released using this technique but I am sure we will see some soon!
Meta have released their Llama 3.2 series of models which include both small models and multi-modal models. The small models weigh in at 1B and 3B and are text-only models designed for edge devices (phones etc). The larger models are 11B and 90B vision models which are capable of image reasoning.
It's currenly possible to try the smaller models locally using Ollama
and the larger models will be available soon. To try the 3B you can use:
ollama run llama3.2
. An example of the 11B Vision model processing an image can be seen on my
GitHub
fVDB (Floating-Point Volume/Voxel Data Base) is an extension to the OpenVDB standard which is used to store sparse volumetric data. This new framework allows for large and high resolution 3D scenes to be created and used for deep learning tasks on 3D data. Applications include self driving cars, environment simulations, high resolution 3D image generation and many more.
The blog post from NVIDIA contains some videos which show realistic 3D city environments creted from spatial data. NVIDIA have also released a paper which details fVDB.
Meta have released their Llama 3.1 series (referred to as a "herd") of open-weight models and their performance is nothing short of stunning. The largest model has 405B parameters and can outperform GPT 4o in many tasks. Also released were 8B parameter and 70B parameter models as well as "instruct" versions. The instruct versions are capable of tool usage and can be use for AI agents.
Lllama 3.1 models are licenced under a new agreement which allows "distillation" which means you can train smaller models and generate synthetic data using Llama 3.1. The Llama model licence also allows commercial use. The announcement was accompanied by a detailed paper which outlines the design and training of the models.