Tensor Dimensions
There are some import concepts that you will need to understand in order to work with AI programming. A good starting point is to understand
tensors and their associated dimensining. This knowledge is critical for training, deploying and inferring from AI models.
What is a tensor?
From a purely mathematical standpoint a tensor is an n-dimensional sized list of numbers with n being any positive integer. The commonly
used terminology differs a little in that the term "tensor" is used for a matrix with 3 or more dimensions.
- A 0-dimensional array is usually referred to as a scalar (a single number)
- A 1-dimensional array is usually referred to as a vector (e.g [x, y, z])
- A 2-dimensional array is usually referred to as a matrix e.g:
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
- A higher dimensional array is usually referred to as a tensor
When working with programming packages such a PyTorch a tensor data type can be used for any sized array. Let's have a look at what a tensor
looks like in PyTorch:
import torch
# Define a 1-dimensional tensor that contains 3 values
my_tensor = torch.tensor([1, 2, 3])
# Print the tensor object and data type
print(my_tensor)
# Print the tensor data type
print(my_tensor.dtype)
>> tensor([1, 2, 3])
torch.int64
We have successfully created a PyTorch tensor of type torch.int64
. Generally when we are dealing with AI models we will be using
floating point numbers rather than integers. This can be achieved in PyTorch by defining a float tensor, denoted as one of
torch.float, torch.float32 or torch.FloatTensor
for 32-bit float tensors. For more information
on the available data types see the PyTorch documentation.
# Define a 1-dimensional tensor that contains 5 values
my_tensor = torch.tensor([1, 2, 3, 4.0, 5.5], dtype=torch.float)
# Print the tensor object
print(my_tensor)
# Print the tensor data type
print(my_tensor.dtype)
>> tensor([1.0000, 2.0000, 3.0000, 4.0000, 5.5000])
torch.float32
What is a Dimension??
Let's start by thinking about the difference between 1-dimensional, 2-dimensional and 3-dimensional spaces. A one dimensional space is
a space that can have any point represented by only single number e.g. 25. A 2 dimensional space is a space where the points are referenced
by a pair of numbers e.g. [25, 32]. A 3 dimensional space is a space where each coordinate is referenced by 3 numbers e.g.[25, 32, 45]. Note
that in every case only a 1-dimensional array of numbers is needed to reference a point in the n-dimensional space.
The qustion is, what do these numbers represent? They can actually represent whatever we want them to. If we want to define map coordinates for example,
then the values would represent the coordinates in each direction on the map. If we had a 2-dimensional map (like Google earth) we could define
a 1-d array containing 2 values to represent X and Y coordinates, or Latitude and Longitude. If we had a 3-dimensional map (like a computer game or rendering
software such as Blender) we could define a 1-dimensional array containing 3 numbers which represent X, Y and Z locations.
But what if we don't want to represent coordinates? Well that's fine as we can define the tensor to represent whatever we like. For examle we could
define a 2-dimensional array to represent a paragraph in a book ([5, 3] could be page 5, paragraph 3).
Bonus: What is the 4th Dimension? 🔽
(Use arrow to expand section)
I'll start off by saying this question is not as scary as it sounds. People will sometimes quote "The 4th dimesion is time, right?". Technically that
is only one possible answer of many. The reason being that we can define the dimensions to represent anything we like. You could indeed represent
time as a 4th dimension alongside the 3 spatial dimensions and the resultant coordinates would be of the form [x, y, z, t]. This representation is used
in some areas of physics, however it's not the only option.
Another possibility for a 4th dimension is a 4th spatial dimension ([x, y, z, w]). What would that look like and how does it differ from having
time as a 4th dimension? Firstly, if you treat time as a 4th dimension you would simply have a 3-d coordinate [x, y, z] with a 1-dimensional time value
appended. This is an effective way to represent a location and a time.
Sounds simple enough, how about a 4th spatial dimension? Well that one is a bit trickier to imagine! We can picture a 2-dimensional surface as a flat
surface like a piece of paper or a photograph. We can also picture a 3-dimensional surface like a room or an object. How could we possibly define a 4th
dimension? Well the important thing to remeber is what we are trying to represent. In this case we are trying to create tensors that represent
spatial coordinates.
This is made difficult by the fact we live in a 3-d world and have no way to picture dimensions above 3. What we need to do is separate the idea of dimensions
from the idea of spatial dimensions. It's perfectly valid to define a 4th spatial dimension even though we haven't observed one in existance. The math will
still work out and remember it's just a theoretical construct. Dimensions higher than 3 are indeed used often in machine learning and AI.
Coming back to the [x, y, z, t] and [x, y, z, w] comparision, one important thing to note is the idea of orthongonality. Put simply, the word orthogonal
mean perpendicular to. For example in a 2-d space there are 90 degrees between the x and y axis just like in 3-d where there are 90 degrees between x, y and z.
This is difficult to picture in 4 dimensions however there are indeed 90 degrees between x, y, z and w dimensions in a theoretical 4-d world. Compare this to
[x, y, z, t] where you have 90 degrees between x, y and z but those dimensions have no relationship to [t]. The time dimension is essentially a straight line that
moves independently of the other dimensions.
Working With Tensors
Now that we understand what tensors are we need to understand how to work with them in PyTorch. The first thing worth learning is the .shape
attribute.
This is used to return an array which specifies the size of every dimension in the tensor.
# Define a 2-dimensional tensor that contains 6 values
my_tensor = torch.rand((2,3), dtype=torch.float)
# Print the tensor object
print(my_tensor)
# Print the tensor shape
print(my_tensor.shape)
# Print the size of the first dimension
print("First dimension size:", my_tensor.shape[0])
>> tensor([[0.1812, 0.4027, 0.9661],
[0.5826, 0.2785, 0.8558]])
torch.Size([2, 3])
First dimension size: 2