Desynchronization attacks proved to be the greatest challenge to audio watermarking systems as they introduce misalignment between the signal carrier and the watermark. This paper proposes a DNN-based speech watermarking system with two adversarial networks jointly trained on a set of desynchronization attacks to embed a randomly generated watermark. The detector neural network is expanded with spatial pyramid pooling layers to be able to handle signals affected by these attacks. A detailed training...