Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I can't understand the reason for paying attention to oneself.

For example, if we are looking at a seq2seq translation task, does self-attention mean that we "highlight" all words that have similar meanings in a sentence together? What's the intuition that this will help the translation task?

for example, in the sentence "I hate apples, I only drink apple juice.", will we encode the first "apples" and "apple" together? why is that useful?



In self attention each token creates both "query" and "key" vectors. These are different vectors, so one type of token can look up data for different types of tokens. "Apple" can generated a verb query to look for a verb in the sentence, while also generating a noun key that other words can look for.

And with multi-head attention, a single token can get data from different types of tokens.

The "self" in self-attention just means that it's looking at other parts of the same string, rather than looking at weights generated from a different string.


>I can't understand the reason for paying attention to oneself.

Certain parts of a sentence strongly inform the meaning of other parts and so it is important to encode them together. If you see the word "bank" in a sentence, is it referring to the financial institution or the land next to a body of water? We know by what came before or what comes after. Attention allows relevant context to inform token processing without being distracted by irrelevant context.


x = “I hate apples, I only drink _”

keys = key_weights * x

query = query_weights * x

values = value_weights * what_to_look_at

For self attention, what_to_look_at = x

For regular attention, where_to_look_at could be a database, memory or anything else.

So in this example if we’re trying to predict the second “apple” the first “apples” is very helpful. If we’re predicting “juice” then we’d use one head of self-attention to look at the first “apples” and a second head to also look at the second “apple”

That’s my understanding at least




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: