Understanding Attention Mechanisms – Part 3: From Cosine Similarity to Dot Product

**MyrinNew** · 03-28-2026, 04:00 PM

In the previous article, we explored the comparison between encoder and decoder outputs. In this article, we will be checking the math on how the calculation is done, and how it can be further simplified.

The output values for the two LSTM cells in the encoder for the word "Let’s" are -0.76 and 0.75.

The output values from the two LSTM cells in the decoder for the token are 0.91 and 0.38.

We can represent this as:

A = Encoder

B = Decoder

Cell #1 Cell #2
-0.76 0.75
0.91 0.38

Now, we plug these values into the cosine similarity equation.

This gives us a result of -0.39.

To simplify this further, a common approach is to compute only the numerator.

The denominator mainly scales the value between -1 and 1, so in some cases, we can ignore it for simplicity.

Since we are dealing with a fixed number of cells, this simplification works well. This is also known as the dot product.

When we calculate only the dot product, we get:

(-0.76 × 0.91) + (0.75 × 0.38) = -0.41

We will explore this further in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?

Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name

… and you’re done! 🚀

🔗 Explore Installerpedia here

More...