Research Papers

It is a good idea to read research papers when learning AI as they provide good explanations and are free to access. Most AI research papers are stored on arxiv.org which is an open access platform.

I have listed some popular papers however the list is nowhere near exhaustive. There has been an explosion of new papers recently and it can be difficult to keep up with the latest developments. It's still good to read the older papers to get an idea of where the latest AI developments have come from. A lot of new papers also build upon previous works.

Large Language Models

Attention Is All You Need

Vaswani et al. (2017)

This groundbreaking paper from 2017 introduced the Transformer model. This is what begun the developments of the powerful GPT models that we see used in LLMs today.

Scaling Laws For Neural Language Models

OpenAI (2020)

Empirical research was conducted to map the relationsip between model size, data size and compute against model performance for large langauge models. This paper introduces some very interesting conclusions, in particular the fact that increasing model size and data leads to improved performance with no observed ceiling. This is one of the pivotal arguments towards the possibility of Artificial General Intelligence (AGI).

LoRA: Low-Rank Adaptation of Large Language Models

Hu et al. (2021)

LoRA is a method of fine tuning LLMs without retraining all the parameters. This method instead freezes all the parameters and then injects low rank matrices into each layer of the tranformer. These matrices are then fine tuned to the desired domain specific content. The result is a significant reduction in compute required for fine tuning with negligilbe loss in model performance.

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Gu et al. (2024)

State Space Models are an alternative to Transformers which perform just as well. SSMs have the advantage in terms of compute requirements and scaling. Transformers scale with an N² law due to their matrix multiplications whereas Mamba scales linearly due to it's RNN based structure. Mamba adds some additional functionality over basic SSMs and is designed to process natural langauge.

Neural Radiance Fields and Gaussian Splatting

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

Mildenhall et al. (2020)

The paper which introduced the NeRF in 2020. Neural Radiance fields are a method of representing a 3d scene from a fully-connected (MLP) model. The model is trained on a set of photographs from a scene and learns the 3d representation of the scene including view dependent colours and reflections. Once trained the model can output a 2d image of the scene from the desired camera location and angle. This is known as "implicit rendering".

3D Gaussian Splatting for Real-Time Radiance Field Rendering

Kerbl et al. (2023)

NeRFs are effective at representing 3d scences from images and videos however they are slow during training and inference. A more recent method named "Gaussian Splatting" is much faster as it uses traditional 3d rasterisation techniques. The scene is represented by 3d gaussians which are trained to match the scene provided to the model. Gaussian Splatting is a form of explicit rendering as the 3d objects in the scene are specified.

Bilateral Guided Radiance Field Processing

Wang et al. (2024)

This paper introdces a new method of rendering NeRFs which can correct for variations in colour of the training images. The method can also perform alterations to the 3d scene by simply providing a single reference image! The model will then alter the entire NeRF to match the style of the reference image.

Publicaitons from Organisations

Meta AI Publications
Meta (owners of Facebook) have a large AI research department named Meta FAIR. They have an open model mentality and pubish many papers.