Was known in the 90s as multiplicative modules, sigma pi units, and hypernetworks.
It’s basically dynamically changes the “weight” of a thing based on run-time information. This is used in transformer networks as a way to implement far-away tokens.
Aug 25, 20241 min read
Was known in the 90s as multiplicative modules, sigma pi units, and hypernetworks.
It’s basically dynamically changes the “weight” of a thing based on run-time information. This is used in transformer networks as a way to implement far-away tokens.