Transformers are a machine learning model. They process sequential data (like sentences). They operate on the whole data stream at once, rather than chunk by chunk (like Recursive Neural Nets). This is preferred because it can allow for parallelization. RNNs are more sequential.

Beyond processing in a parallel manner, transformers also employ self-attention (see attention (machine learning) ). You can add these to RNNs to improve their accuracy. Unfortunately, attention with RNNs usually lack the necessary data because they operate on the next token without a ton of high-fidelity insight into what all of the previous tokens were. I believe this limits how well they can operate on far-away tokens.