Transformer decoder architecture. Topics include multi-head attention, layer Decoder-Only Transf...