Copyright and Licensing
Articles accepted for publication will be licensed under the Creative Commons BY-NC-SA. Authors must sign a non-exclusive distribution agreement after article acceptance.
Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, different grammars and lack of data. Currently, many SLT models rely on intermediate gloss annotations as outputs or latent priors. Glosses can help models to correctly segment and align signs to better understand the video. How- ever, the use of glosses comes with significant limitations, since obtaining annotations is quite difficult. Therefore, scaling gloss-based models to millions of samples remains impractical, specially considering the scarcity of sign language datasets. In a similar fashion, many models use video data that requires larger models which typically only work on high end GPUs, and are less invariant to signers appearance and context. In this work we propose a gloss-free pose-based SLT model. Using the extracted pose as feature allow for
a sign significant reduction in the dimensionality of the data and the size of the model. We evaluate the state of the art, compare available models and develop a keypoint-based Transformer model for gloss-free. SLT, trained on RWTH-Phoenix, a standard dataset for benchmarking SLT models alongside GSL, a simpler laboratory-made Greek Sign Language dataset.
Y.LeCun,Y.Bengio,andG.Hinton,“Deeplearning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
Y. Bengio, Y. Lecun, and G. Hinton, “Deep learning for ai,” Communications of the ACM, vol. 64, no. 7, pp. 58–65, 2021.
D. Bragg, O. Koller, M. Bellard, L. Berke, P. Boudreault, A. Braffort, ..., and M. Ringel Morris, “Sign language recognition, generation, and translation: An interdisciplinary perspective,” in Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility, October 2019, pp. 16–31.
O. Koller, “Quantitative survey of the state of the art in sign language recognition,” arXiv preprint arXiv:2008.09918, 2020.
I. Papastratis, C. Chatzikonstantinou, D. Konstantini- dis, K. Dimitropoulos, and P. Daras, “Artificial intelli- gence technologies for sign language,” Sensors, vol. 21, no. 17, p. 5843, 2021.
J. Zheng, Y. Wang, C. Tan, S. Li, G. Wang, J. Xia, ..., and S. Z. Li, “Cvt-slr: Contrastive visual-textual transformation for sign language recognition with vari- ational alignment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2023, pp. 23 141–23 150.
P. Selvaraj, G. Nc, P. Kumar, and M. Khapra, “Open- hands: Making sign language recognition accessible with pose-based pretrained models across languages,” arXiv preprint arXiv:2110.05877, 2021.
M. Erard. (2017) Why sign language gloves don’t help deaf people. [Online]. Available: https: //www.theatlantic.com/technology/archive/2017/11/ why-sign-language-gloves-dont-help-deaf-people/ 545441/
M. De Coster, D. Shterionov, M. Van Herreweghe, and J. Dambre, “Machine translation from signed to spoken languages: State of the art and challenges,” Universal Access in the Information Society, pp. 1–27, 2023.
B. Zhou, Z. Chen, A. Clape ́s, J. Wan, Y. Liang, S. Es- calera, Z. Lei, and D. Zhang, “Gloss-free sign language translation: Improving from visual-language pretrain- ing,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 20 871– 20 881.
N. Frishberg, N. Hoiting, and D. I. Slobin, “Transcription,”inSignLanguage. Berlin:DeGruyter Mouton, 2012, pp. 1045–1075. [Online]. Available: https://doi.org/10.1515/9783110261325.1045
M. Vermeerbergen, L. Leeson, and O. A. Crasborn, Simultaneity in Signed Languages: Form and Function. Amsterdam: John Benjamins Publishing, 2007, vol. 281. [Online]. Available: https://doi.org/10.1075/cilt. 281
M. Vermeerbergen, “Past and current trends in sign language research,” Language & Communication, vol. 26, no. 2, pp. 168–192, 2006. [Online]. Available: https://doi.org/10.1016/j.langcom.2005.10.004
M. De Sisto, V. Vandeghinste, S. E. Go ́mez, M. De Coster, D. Shterionov, and H. Seggion, “Chal- lenges with sign language datasets for sign language recognition and translation,” in Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2022). Marseille, France: Euro- pean Language Resources Association (ELRA), 2022, pp. 2478–2487.
N. C. Camgoz, S. Hadfield, O. Koller, H. Ney, and R. Bowden, “Neural sign language translation,” in Pro- ceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7784–7793.
Y. Chen, R. Zuo, F. Wei, Y. Wu, S. Liu, and B. Mak, “Two-stream network for sign language recognition and
Z.Chen,B.Zhou,J.Li,J.Wan,Z.Lei,N.Jiang,Q.Lu, and G. Zhao, “Factorized learning assisted with large language model for gloss-free sign language transla- tion,” arXiv preprint arXiv:2403.12556, 2024.
Y. Liu, J. Gu, N. Goyal, X. Li, S. Edunov, M. Ghazvininejad, M. Lewis, and L. Zettlemoyer, “Multilingual denoising pre-training for neural machine translation,” Transactions of the Association for Computational Linguistics, vol. 8, pp. 726–742, 2020.
translation,” Advances in Neural Information Process- ing Systems, vol. 35, pp. 17 043–17 056, 2022.
A.Yin,T.Zhong,L.Tang,W.Jin,T.Jin,andZ.Zhao, “Gloss attention for gloss-free sign language transla- tion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2551–2562.
Y.Kim,M.Kwak,D.Lee,Y.Kim,andH.Baek,“Key- point based sign language translation without glosses,” arXiv preprint arXiv:2204.10511, 2022.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, ..., and M. Grundmann, “Medi- apipe: A framework for building perception pipelines,” arXiv preprint arXiv:1906.08172, 2019.
A.MoryossefandM.Mu ̈ller,“Signlanguagedatasets,” https://github.com/sign-language-processing/datasets, 2021.
Copyright (c) 2024 Pedro Dal Bianco, Gastón Ríos, Waldo Hasperué, Oscar Stanchi, Facundo Quiroga, Franco Ronchetti
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Articles accepted for publication will be licensed under the Creative Commons BY-NC-SA. Authors must sign a non-exclusive distribution agreement after article acceptance.
Review Stats:
Mean Time to First Response: 89 days
Mean Time to Acceptance Response: 114 days
Member of:
ISSN
1666-6038 (Online)
1666-6046 (Print)