skip to main content
research-article

SignRing: Continuous American Sign Language Recognition Using IMU Rings and Virtual IMU Data

Published: 27 September 2023 Publication History

Abstract

Sign language is a natural language widely used by Deaf and hard of hearing (DHH) individuals. Advanced wearables are developed to recognize sign language automatically. However, they are limited by the lack of labeled data, which leads to a small vocabulary and unsatisfactory performance even though laborious efforts are put into data collection. Here we propose SignRing, an IMU-based system that breaks through the traditional data augmentation method, makes use of online videos to generate the virtual IMU (v-IMU) data, and pushes the boundary of wearable-based systems by reaching the vocabulary size of 934 with sentences up to 16 glosses. The v-IMU data is generated by reconstructing 3D hand movements from two-view videos and calculating 3-axis acceleration data, by which we are able to achieve a word error rate (WER) of 6.3% with a mix of half v-IMU and half IMU training data (2339 samples for each), and a WER of 14.7% with 100% v-IMU training data (6048 samples), compared with the baseline performance of the 8.3% WER (trained with 2339 samples of IMU data). We have conducted comparisons between v-IMU and IMU data to demonstrate the reliability and generalizability of the v-IMU data. This interdisciplinary work covers various areas such as wearable sensor development, computer vision techniques, deep learning, and linguistics, which can provide valuable insights to researchers with similar research objectives.

References

[1]
Vassilis Athitsos, Carol Neidle, Stan Sclaroff, Joan Nash, Alexandra Stefan, Quan Yuan, and Ashwin Thangali. 2008. The American sign language lexicon video dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, 1--8.
[2]
Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in Neural Information Processing Systems 33 (2020), 12449--12460.
[3]
Donald J Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Vol. 10. 359--370.
[4]
Matyáš Boháček and Marek Hrúz. 2022. Sign pose-based transformer for word-level sign language recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 182--191.
[5]
Matyáš Bohacek and Marek Hrúz. 2023. Learning from What is Already Out There: Few-shot Sign Language Recognition with Online Dictionaries. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG). IEEE, 1--6.
[6]
Danielle Bragg, Oscar Koller, Mary Bellard, Larwan Berke, Patrick Boudreault, Annelies Braffort, Naomi Caselli, Matt Huenerfauth, Hernisa Kacorri, Tessa Verhoef, et al. 2019. Sign language recognition, generation, and translation: An interdisciplinary perspective. In The 21st International ACM SIGACCESS Conference on Computers and Accessibility. 16--31.
[7]
Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden. 2018. Neural sign language translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7784--7793.
[8]
Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, and Richard Bowden. 2020. Sign language transformers: Joint end-to-end sign language recognition and translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10023--10033.
[9]
Ke-Yu Chen, Shwetak N Patel, and Sean Keller. 2016. Finexus: Tracking precise motions of multiple fingertips using magnetic sensing. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. 1504--1514.
[10]
Seokmin Choi, Yang Gao, Yincheng Jin, Se jun Kim, Jiyang Li, Wenyao Xu, and Zhanpeng Jin. 2022. PPGface: Like What You Are Watching? Earphones Can" Feel" Your Facial Expressions. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 2 (2022), 1--32.
[11]
Cao Dong, Ming C Leu, and Zhaozheng Yin. 2015. American sign language alphabet recognition using microsoft kinect. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 44--52.
[12]
Philippe Dreuw, Carol Neidle, Vassilis Athitsos, Stan Sclaroff, and Hermann Ney. 2008. Benchmark databases for video-based automatic sign language recognition. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08).
[13]
Philippe Dreuw, David Rybach, Thomas Deselaers, Morteza Zahedi, and Hermann Ney. 2007. Speech recognition techniques for a sign language recognition system. Hand 60 (2007), 80.
[14]
Deniz Ekiz, Gamze Ege Kaya, Serkan Buğur, Sıla Güler, Buse Buz, Bilgin Kosucu, and Bert Arnrich. 2017. Sign sentence recognition with smart watches. In 2017 25th Signal Processing and Communications Applications Conference (SIU). IEEE, 1--4.
[15]
Biyi Fang, Jillian Co, and Mi Zhang. 2017. Deepasl: Enabling ubiquitous and non-intrusive word and sentence-level sign language translation. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. 1--13.
[16]
Jens Forster, Christoph Schmidt, Thomas Hoyoux, Oscar Koller, Uwe Zelle, Justus Piater, and Hermann Ney. 2012. Rwth-phoenix-weather: A large vocabulary sign language recognition and translation corpus. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). 3785--3789.
[17]
Yang Gao, Yincheng Jin, Seokmin Choi, Jiyang Li, Junjie Pan, Lin Shu, Chi Zhou, and Zhanpeng Jin. 2021. SonicFace: Tracking Facial Expressions Using a Commodity Microphone Array. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1--33.
[18]
Felix A Gers, Nicol N Schraudolph, and Jürgen Schmidhuber. 2002. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research 3, Aug (2002), 115--143.
[19]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of 23rd Int'l Conference on Machine Learning. 369--376.
[20]
Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 6645--6649.
[21]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.
[22]
Jiahui Hou, Xiang-Yang Li, Peide Zhu, Zefan Wang, Yu Wang, Jianwei Qian, and Panlong Yang. 2019. Signspeaker: A real-time, high-precision smartwatch-based sign language translator. In Proceedings of 25th Int'l Conf. on Mobile Computing and Networking. 1--15.
[23]
Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, and Weiping Li. 2018. Video-based sign language recognition without temporal segmentation. In Thirty-Second AAAI Conference on Artificial Intelligence.
[24]
Umar Iqbal, Pavlo Molchanov, Thomas Breuel Juergen Gall, and Jan Kautz. 2018. Hand pose estimation via latent 2.5 d heatmap regression. In Proceedings of the European Conference on Computer Vision (ECCV). 118--134.
[25]
Yincheng Jin, Seokmin Choi, Yang Gao, Jiyang Li, Zhengxiong Li, and Zhanpeng Jin. 2023. TransASL: A Smart Glass Based Comprehensive ASL Recognizer in Daily Life. In Proceedings of the 28th International Conference on Intelligent User Interfaces (IUI '23). Association for Computing Machinery, New York, NY, USA, 802--818. https://doi.org/10.1145/3581641.3584071
[26]
Yincheng Jin, Yang Gao, Yanjun Zhu, Wei Wang, Jiyang Li, Seokmin Choi, Zhangyu Li, Jagmohan Chauhan, Anind K. Dey, and Zhanpeng Jin. 2021. SonicASL: An Acoustic-Based Sign Language Gesture Recognizer Using Earphones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 2, Article 67 (jun 2021), 30 pages. https://doi.org/10.1145/3463519
[27]
Yincheng Jin, Shibo Zhang, Yang Gao, Xuhai Xu, Seokmin Choi, Zhengxiong Li, Henry J. Adler, and Zhanpeng Jin. 2023. SmartASL: "Point-of-Care" Comprehensive ASL Interpreter Using Wearables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 7, 2, Article 60 (jun 2023), 21 pages. https://doi.org/10.1145/3596255
[28]
Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, et al. 2019. A comparative study on transformer vs RNN in speech applications. In Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 449--456.
[29]
Dietrich Klakow and Jochen Peters. 2002. Testing the correlation of word error rate and perplexity. Speech Communication 38, 1-2 (2002), 19--28.
[30]
Oscar Koller. 2020. Quantitative survey of the state of the art in sign language recognition. arXiv preprint arXiv:2008.09918 (2020).
[31]
Pradeep Kumar, Himaanshu Gauba, Partha Pratim Roy, and Debi Prosad Dogra. 2017. A multimodal framework for sensor based sign language recognition. Neurocomputing 259 (2017), 21--38.
[32]
Hyeokhyen Kwon, Gregory D Abowd, and Thomas Plötz. 2021. Complex Deep Neural Networks from Large Scale Virtual IMU Data for Effective Human Activity Recognition Using Wearables. Sensors 21, 24 (2021), 8337.
[33]
Hyeokhyen Kwon, Catherine Tong, Harish Haresamudram, Yan Gao, Gregory D Abowd, Nicholas D Lane, and Thomas Ploetz. 2020. Imutube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1--29.
[34]
Hyeokhyen Kwon, Bingyao Wang, Gregory D Abowd, and Thomas Plötz. 2021. Approaching the real-world: Supporting activity recognition training with virtual imu data. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1--32.
[35]
Kehuang Li, Zhengyu Zhou, and Chin-Hui Lee. 2016. Sign transition modeling and a scalable solution to continuous sign language recognition for real-world applications. ACM Transactions on Accessible Computing (TACCESS) 8, 2 (2016), 1--23.
[36]
Yilin Liu, Fengyang Jiang, and Mahanth Gowda. 2020. Finger Gesture Tracking for Interactive Applications: A Pilot Study with Sign Languages. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020), 1--21.
[37]
Yilin Liu, Shijia Zhang, and Mahanth Gowda. 2021. When Video meets Inertial Sensors: Zero-shot Domain Adaptation for Finger Motion Analytics with Inertial Sensors. In Proceedings of the International Conference on Internet-of-Things Design and Implementation. 182--194.
[38]
Signing Savvy LLC. 2019. Signing savvy -- ASL sign language video dictionary. https://www.signingsavvy.com/
[39]
Michele Miozzo and Francesca Peressotti. 2022. How the hand has shaped sign languages. Scientific Reports 11980 (2022), 1--12.
[40]
Mohamed Mohandes, Mohamed Deriche, and Junzhao Liu. 2014. Image-based and sensor-based approaches to Arabic sign language recognition. IEEE Transactions on Human-Machine Systems 44, 4 (2014), 551--557.
[41]
Meinard Müller. 2007. Dynamic time warping. Information Retrieval for Music and Motion (2007), 69--84.
[42]
Chaithanya Kumar Mummadi, Frederic Philips Peter Leo, Keshav Deep Verma, Shivaji Kasireddy, Philipp Marcel Scholl, and Kristof Van Laerhoven. 2017. Real-time embedded recognition of sign language alphabet fingerspelling in an imu-based glove. In Proceedings of the 4th international Workshop on Sensor-based Activity Recognition and Interaction. 1--6.
[43]
Carol Neidle, Stan Sclaroff, and Vassilis Athitsos. 2001. SignStream: A tool for linguistic and computer vision research on visual-gestural language data. Behavior Research Methods, Instruments, & Computers 33, 3 (2001), 311--320.
[44]
Carol Neidle, Ashwin Thangali, and Stan Sclaroff. 2012. Challenges in Development of the American Sign Language Lexicon Video Dataset (ASLLVD) Corpus. In Language Resources and Evaluation Conference (LREC). Citeseer.
[45]
Iason Oikonomidis, Nikolaos Kyriazis, and Antonis A Argyros. 2011. Efficient model-based 3D tracking of hand articulations using Kinect. In Proceedings of the British Machine Vision Conference. 101.1--101.11.
[46]
W. H. Organization. 2021. Deafness and hearing loss. https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss
[47]
Mingzhang Pan, Yingzhe Tang, and Hongqi Li. 2023. State-of-the-Art in Data Gloves: A Review of Hardware, Algorithms, and Applications. IEEE Transactions on Instrumentation and Measurement (2023).
[48]
Daniel S Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D Cubuk, and Quoc V Le. 2019. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779 (2019).
[49]
Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed AA Osman, Dimitrios Tzionas, and Michael J Black. 2019. Expressive body capture: 3D hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10975--10985.
[50]
Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied Hands: Modeling and Capturing Hands and Bodies Together. ACM Trans. Graph. 36, 6, Article 245 (nov 2017), 17 pages. https://doi.org/10.1145/3130800.3130883
[51]
Abdou Shalaby, Mohammed Elmogy, and Ahmed Abo El-Fetouh. 2017. Algorithms and applications of structure from motion (SFM): A survey. Algorithms 6, 06 (2017).
[52]
Jiacheng Shang and Jie Wu. 2017. A robust sign language recognition system with multiple Wi-Fi devices. In Proceedings of the Workshop on Mobility in the Evolving Internet Architecture. 19--24.
[53]
Toby Sharp, Cem Keskin, Duncan Robertson, Jonathan Taylor, Jamie Shotton, David Kim, Christoph Rhemann, Ido Leichter, Alon Vinnikov, Yichen Wei, et al. 2015. Accurate, robust, and flexible real-time hand tracking. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 3633--3642.
[54]
Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 11 (2016), 2298--2304.
[55]
Mohammad Sharif Shourijeh, Reza Sharif Razavian, and John McPhee. 2017. Estimation of maximum finger tapping frequency using musculoskeletal dynamic simulations. Journal of Computational and Nonlinear Dynamics 12, 5 (2017).
[56]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[57]
Thad Starner and Alex Pentland. 1997. Real-time american sign language recognition from video using hidden markov models. In Motion-based recognition. Springer, 227--243.
[58]
American Sign Languange University. [n.d.]. ASL Classifiers Level 1. https://www.lifeprint.com/asl101/pages-signs/classifiers/classifiers-main.htm
[59]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
[60]
William G Vicars. 1997. ASL University. Lifeprint Institute.
[61]
William G. Vicars. 2017. American Sign Language University. http://www.lifeprint.com/index.htm
[62]
Sudheendra Vijayanarasimhan, Susanna Ricco, Cordelia Schmid, Rahul Sukthankar, and Katerina Fragkiadaki. 2017. Sfm-net: Learning of structure and motion from video. arXiv preprint arXiv:1704.07804 (2017).
[63]
Zhibo Wang, Tengda Zhao, Jinxin Ma, Hongkai Chen, Kaixin Liu, Huajie Shao, Qian Wang, and Ju Ren. 2020. Hear sign language: A real-time end-to-end sign language recognition system. IEEE Transactions on Mobile Computing 21, 7 (2020), 2398--2410.
[64]
American Sign Language University William G. Vicars. 2017. First 100 Signs. http://www.lifeprint.com/asl101/pages-layout/concepts.htm
[65]
Jian Wu, Lu Sun, and Roozbeh Jafari. 2016. A wearable system for recognizing American sign language in real-time using IMU and surface EMG sensors. IEEE Journal of Biomedical and Health Informatics 20, 5 (2016), 1281--1290.
[66]
Zahoor Zafrulla, Helene Brashear, Thad Starner, Harley Hamilton, and Peter Presti. 2011. American sign language recognition with the kinect. In Proceedings of the 13th International Conference on Multimodal Interfaces. 279--286.
[67]
Morteza Zahedi, Daniel Keysers, and Hermann Ney. 2005. Pronunciation clustering and modeling of variability for appearance-based sign language recognition. In International Gesture Workshop. Springer, 68--79.
[68]
Cheng Zhang, Qiuyue Xue, Anandghan Waghmare, Ruichen Meng, Sumeet Jain, Yizeng Han, Xinyu Li, Kenneth Cunefare, Thomas Ploetz, Thad Starner, et al. 2018. FingerPing: Recognizing fine-grained hand poses using active acoustic on-body sensing. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1--10.
[69]
Fan Zhang, Valentin Bazarevsky, Andrey Vakunov, Andrei Tkachenka, George Sung, Chuo-Ling Chang, and Matthias Grundmann. 2020. Mediapipe hands: On-device real-time hand tracking. arXiv preprint arXiv:2006.10214 (2020).
[70]
Qian Zhang, JiaZhen Jing, Dong Wang, and Run Zhao. 2022. WearSign: Pushing the Limit of Sign Language Translation Using Inertial and EMG Wearables. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 1 (2022), 1--27.
[71]
Qian Zhang, Dong Wang, Run Zhao, and Yinggang Yu. 2019. MyoSign: enabling end-to-end sign language recognition with wearables. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 650--660.
[72]
Christian Zimmermann and Thomas Brox. 2017. Learning to estimate 3D hand pose from single RGB images. In Proceedings of the IEEE International Conference on Computer Vision. 4903--4911.

Cited By

View all
  • (2024)EchoPFLProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435608:1(1-22)Online publication date: 6-Mar-2024
  • (2024)MAPLEProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435148:1(1-25)Online publication date: 6-Mar-2024
  • (2024)CAvatarProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314247:4(1-24)Online publication date: 12-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 7, Issue 3
September 2023
1734 pages
EISSN:2474-9567
DOI:10.1145/3626192
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 September 2023
Published in IMWUT Volume 7, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Human computer interaction
  2. computer vision
  3. data augmentation
  4. sign language recognition

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)491
  • Downloads (Last 6 weeks)36
Reflects downloads up to 18 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)EchoPFLProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435608:1(1-22)Online publication date: 6-Mar-2024
  • (2024)MAPLEProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435148:1(1-25)Online publication date: 6-Mar-2024
  • (2024)CAvatarProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314247:4(1-24)Online publication date: 12-Jan-2024
  • (2024)TFSemantic: A Time–Frequency Semantic GAN Framework for Imbalanced Classification Using Radio SignalsACM Transactions on Sensor Networks10.1145/361409620:4(1-22)Online publication date: 11-May-2024
  • (2024)Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning ExperiencesProceedings of the CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642109(1-18)Online publication date: 11-May-2024
  • (2024)ASLRing: American Sign Language Recognition with Meta-Learning on Wearables2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI)10.1109/IoTDI61053.2024.00022(203-214)Online publication date: 13-May-2024
  • (2023)A Framework for Designing Fair Ubiquitous Computing SystemsAdjunct Proceedings of the 2023 ACM International Joint Conference on Pervasive and Ubiquitous Computing & the 2023 ACM International Symposium on Wearable Computing10.1145/3594739.3610677(366-373)Online publication date: 8-Oct-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media

-