Research Papers
It is a good idea to read research papers when learning AI as they provide good explanations and are free to access.
Most AI research papers are stored on arxiv.org which is an open access platform.
I have listed some popular papers however the list is nowhere near exhaustive. There has been an explosion of new papers
recently and it can be difficult to keep up with the latest developments. It's still good to read the older papers to
get an idea of where the latest AI developments have come from. A lot of new papers also build upon previous works.
Large Language Models
Attention Is All You Need
Vaswani et al. (2017)
This groundbreaking paper from 2017 introduced the Transformer model. This is what begun the developments of the powerful
GPT models that we see used in LLMs today.
Scaling Laws For Neural Language Models
OpenAI (2020)
Empirical research was conducted to map the relationsip between model size, data size and compute against model performance
for large langauge models. This paper introduces some very interesting conclusions, in particular the fact that increasing model
size and data leads to improved performance with no observed ceiling. This is one of the pivotal arguments towards the possibility
of Artificial General Intelligence (AGI).
LoRA: Low-Rank Adaptation of Large Language Models
Hu et al. (2021)
LoRA is a method of fine tuning LLMs without retraining all the parameters. This method instead freezes all the parameters
and then injects low rank matrices into each layer of the tranformer. These matrices are then fine tuned to the desired
domain specific content. The result is a significant reduction in compute required for fine tuning with negligilbe loss
in model performance.
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Gu et al. (2024)
State Space Models are an alternative to Transformers which perform just as well. SSMs have the advantage in terms of
compute requirements and scaling. Transformers scale with an N
2 law due to their matrix multiplications whereas Mamba
scales linearly due to it's RNN based structure. Mamba adds some additional functionality over basic SSMs and is designed
to process natural langauge.
Neural Radiance Fields and Gaussian Splatting
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Mildenhall et al. (2020)
The paper which introduced the NeRF in 2020. Neural Radiance fields are a method of representing a 3d scene from a fully-connected (MLP) model. The model
is trained on a set of photographs from a scene and learns the 3d representation of the scene including view dependent colours and reflections. Once
trained the model can output a 2d image of the scene from the desired camera location and angle. This is known as "implicit rendering".
3D Gaussian Splatting for Real-Time Radiance Field Rendering
Kerbl et al. (2023)
NeRFs are effective at representing 3d scences from images and videos however they are slow during training and inference. A more recent method named
"Gaussian Splatting" is much faster as it uses traditional 3d rasterisation techniques. The scene is represented by 3d gaussians which are trained to
match the scene provided to the model. Gaussian Splatting is a form of explicit rendering as the 3d objects in the scene are specified.
Bilateral Guided Radiance Field Processing
Wang et al. (2024)
This paper introdces a new method of rendering NeRFs which can correct for variations in colour of the training images. The method can also
perform alterations to the 3d scene by simply providing a single reference image! The model will then alter the entire NeRF to match the style
of the reference image.
Publicaitons from Organisations
Meta AI Publications
Meta (owners of Facebook) have a large AI research department named Meta FAIR. They have an open model mentality and pubish
many papers.