Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task

Osman Furkan Karakuş; Ayla Gülcü; Ali Can Karaca

doi:10.38088/jise.1471047

Research Article

Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task

Year 2025, Volume: 9 Issue: 1, 28 - 38, 17.06.2025

Osman Furkan Karakuş , Ayla Gülcü , Ali Can Karaca

https://doi.org/10.38088/jise.1471047

Abstract

This study introduces a novel approach for segmenting lines of text in handwritten documents using a vision transformer model. Specifically, we adapt DEtection TRansformer (DETR) model to detect line segments in images of handwritten documents. In order to adapt DETR for the line segmentation task, we applied a pre-processing step that involves dividing each line into fixed-size image patches followed by adding positional encoding. We benefit from DETR model with a ResNet-101 backbone pretrained on the Common Objects in Context (COCO) object detection training dataset, and re-train this model using our novel, complex line segmentation dataset consisting of 1,610 handwritten forms. To evaluate the performance, another line segmentation method named Bangla Document Recognition through Instance-level Segmentation of Handwritten Text Images (BN-DRISHTI) is implemented. This method utilizes the You Only Look Once (YOLO) object detection model. Both object detection-based methods involve a learning phase during which the model is trained or fine-tuned on the dataset. For a diverse set of baselines methods, we have also implemented two learning-free algorithms such as A* Search Algorithm and the Genetic Algorithm (GA). Experimental results based on the Intersection over Union (IoU) metric demonstrate that the proposed method outperforms all other methods in terms of the detection rate, recognition accuracy, and Text Line Detection Metric (TLDM). The quantitative results also indicate that two learning-free algorithms fail to segment highly skewed lines successfully in the dataset. The A* algorithm achieves a high recognition accuracy of 0.734, compared to GA and BN-DRISHTI, which achieve recognition accuracies of 0.498 and 0.689, respectively. Our proposed approach achieves the highest recognition accuracy of 0.872, outperforming all other methods. We show that the DETR model which requires only a single fine-tuning phase for adapting to line-segmentation task, not only simplifies the training and implementation process but also improves accuracy and efficiency in detecting and segmenting handwritten text lines. DETR’s use of a transformer’s global attention mechanism allows it to better understand the entire context of an image rather than relying solely on local features. This is particularly beneficial for managing the diverse and complex patterns found in handwritten text where traditional models might struggle with issues such as overlapping text lines or varied handwriting styles.

Keywords

Vision Transformers, Handwritten Text Segmentation, Object Detection, Optical Character Recognition, Deep Learning, Document Analysis

Ethical Statement

This study does not require ethics committee permission or any special permission.

References

1. Barakat, K., Berat, Rafi Cohen, Ahmad Droby, Irina Rabaev, and Jihad El-Sana. "Learning-free text line segmentation for historical handwritten documents." Applied Sciences 10, no. 22 (2020): 8276.
2. Droby, A., Barakat, B., Alaasam, R., Madi, B., Rabaev, I., & El-Sana, J. "Text Line Extraction in Historical Documents Using Mask R-CNN." Signals, vol. 3, pp. 535–549, Aug. 2022, doi: 10.3390/signals3030032.
3. Likforman-Sulem, L., Zahour, A., & Taconet, B. "Text Line Segmentation of Historical Documents: A Survey." International Journal on Document Analysis and Recognition (IJDAR), vol. 9, May 2007, doi: 10.1007/s10032-006-0023-z.
4. Renton, Guillaume, Yann Soullard, Clément Chatelain, Sébastien Adam, Christopher Kermorvant, and Thierry Paquet. "Fully convolutional network with dilated convolutions for handwritten text line segmentation." International Journal on Document Analysis and Recognition (IJDAR) 21 (2018): 177-186.
5. Barakat, Berat, Ahmad Droby, Majeed Kassis, and Jihad El-Sana. "Text line segmentation for challenging handwritten document images using fully convolutional network." In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 374-379. IEEE, 2018.
6. Moysset, Bastien, Christopher Kermorvant, Christian Wolf, and Jérôme Louradour. "Paragraph text segmentation into lines with recurrent neural networks." In 2015 13th international conference on document analysis and recognition (ICDAR), pp. 456-460. IEEE, 2015.
7. Ren, S., He, K., Girshick, R., & Sun, J. "Faster R-CNN: Towards real-time object detection with region proposal networks." CoRR, vol. 28, 2015.
8. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. "You Only Look Once: Unified, Real-Time Object Detection." CoRR, vol. abs/1506.02640, 2015.
9. Arivazhagan, Manivannan, Harish Srinivasan, and Sargur Srihari. "A statistical approach to line segmentation in handwritten documents." In Document recognition and retrieval XIV, vol. 6500, pp. 245-255. SPIE, 2007.
10. Sanasam, Inunganbi, Prakash Choudhary, and Khumanthem Manglem Singh. "Line and word segmentation of handwritten text document by mid-point detection and gap trailing." Multimedia Tools and Applications 79, no. 41 (2020): 30135-30150.
11. dos Santos, Rodolfo P., Gabriela S. Clemente, Tsang Ing Ren, and George DC Cavalcanti. "Text line segmentation based on morphology and histogram projection." In 2009 10th International Conference on Document Analysis and Recognition, pp. 651-655. IEEE, 2009.
12. Louloudis, Georgios, Basilios Gatos, Ioannis Pratikakis, and Constantin Halatsis. "Text line and word segmentation of handwritten documents." Pattern recognition 42, no. 12 (2009): 3169-3183.
13. Smith, R. "An overview of the Tesseract OCR engine." In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, 2007, pp. 629–633, IEEE.
14. Surinta, O., Holtkamp, M., Karabaa, F., Van Oosten, J.-P., Schomaker, L., & Wiering, M. "A Path Planning for Line Segmentation of Handwritten Documents." In 2014 14th International Conference on Frontiers in Handwriting Recognition, pages 175-180, IEEE, 2014.
15. Toiganbayeva, N., Kasem, M., Abdimanap, G., Bostanbekov, K., Abdallah, A., Alimova, A., & Nurseitov, D. "KOHTD: Kazakh offline handwritten text dataset." Signal Processing: Image Communication, vol. 108, pages 116827, Elsevier BV, Oct. 2022.
16. Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., & Schmidhuber, J. "A novel connectionist system for unconstrained handwriting recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, 2009.
17. Voigtlaender, P., Doetsch, P., & Ney, H. "Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks." In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 228–233, 2016.
18. Brunessaux, Sylvie, Patrick Giroux, Bruno Grilheres, Mathieu Manta, Maylis Bodin, Khalid Choukri, Olivier Galibert, and Juliette Kahn. "The maurdor project: Improving automatic processing of digital documents." In 2014 11th IAPR international workshop on document analysis systems, pp. 349-354. IEEE, 2014.
19. Long, S., He, J., Yao, C., Hu, W., Wang, Q., & Bai, X. "TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes." CoRR, vol. abs/1807.01544, 2018.
20. Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. "Character Region Awareness for Text Detection." CoRR, vol. abs/1904.01941, 2019.
21. Qu, C., Liu, C., Liu, Y., Chen, X., Peng, D., Guo, F., & Jin, L. "Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 5937–5946.
22. Jubaer, S. M., Tabassum, N., Rahman, M. A., & Islam, M. K. "BN-DRISHTI: Bangla Document Recognition Through Instance-Level Segmentation of Handwritten Text Images." In Document Analysis and Recognition – ICDAR 2023 Workshops, Mickael Coustaty and Alicia Fornés, Eds., Springer Nature Switzerland, Cham, pages 195–212, 2023.
23. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. "End-to-End Object Detection with Transformers." CoRR, vol. abs/2005.12872, 2020.
24. Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y. and Yang, Z., 2022. A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence, 45(1), pp.87-110.
25. Louloudis, Georgios, Basilios Gatos, Ioannis Pratikakis, and Constantin Halatsis. "Text line detection in handwritten documents." Pattern recognition 41, no. 12 (2008): 3758-3772.

Year 2025, Volume: 9 Issue: 1, 28 - 38, 17.06.2025

Osman Furkan Karakuş , Ayla Gülcü , Ali Can Karaca

https://doi.org/10.38088/jise.1471047

Abstract

References

1. Barakat, K., Berat, Rafi Cohen, Ahmad Droby, Irina Rabaev, and Jihad El-Sana. "Learning-free text line segmentation for historical handwritten documents." Applied Sciences 10, no. 22 (2020): 8276.
2. Droby, A., Barakat, B., Alaasam, R., Madi, B., Rabaev, I., & El-Sana, J. "Text Line Extraction in Historical Documents Using Mask R-CNN." Signals, vol. 3, pp. 535–549, Aug. 2022, doi: 10.3390/signals3030032.
3. Likforman-Sulem, L., Zahour, A., & Taconet, B. "Text Line Segmentation of Historical Documents: A Survey." International Journal on Document Analysis and Recognition (IJDAR), vol. 9, May 2007, doi: 10.1007/s10032-006-0023-z.
4. Renton, Guillaume, Yann Soullard, Clément Chatelain, Sébastien Adam, Christopher Kermorvant, and Thierry Paquet. "Fully convolutional network with dilated convolutions for handwritten text line segmentation." International Journal on Document Analysis and Recognition (IJDAR) 21 (2018): 177-186.
5. Barakat, Berat, Ahmad Droby, Majeed Kassis, and Jihad El-Sana. "Text line segmentation for challenging handwritten document images using fully convolutional network." In 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 374-379. IEEE, 2018.
6. Moysset, Bastien, Christopher Kermorvant, Christian Wolf, and Jérôme Louradour. "Paragraph text segmentation into lines with recurrent neural networks." In 2015 13th international conference on document analysis and recognition (ICDAR), pp. 456-460. IEEE, 2015.
7. Ren, S., He, K., Girshick, R., & Sun, J. "Faster R-CNN: Towards real-time object detection with region proposal networks." CoRR, vol. 28, 2015.
8. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. "You Only Look Once: Unified, Real-Time Object Detection." CoRR, vol. abs/1506.02640, 2015.
9. Arivazhagan, Manivannan, Harish Srinivasan, and Sargur Srihari. "A statistical approach to line segmentation in handwritten documents." In Document recognition and retrieval XIV, vol. 6500, pp. 245-255. SPIE, 2007.
10. Sanasam, Inunganbi, Prakash Choudhary, and Khumanthem Manglem Singh. "Line and word segmentation of handwritten text document by mid-point detection and gap trailing." Multimedia Tools and Applications 79, no. 41 (2020): 30135-30150.
11. dos Santos, Rodolfo P., Gabriela S. Clemente, Tsang Ing Ren, and George DC Cavalcanti. "Text line segmentation based on morphology and histogram projection." In 2009 10th International Conference on Document Analysis and Recognition, pp. 651-655. IEEE, 2009.
12. Louloudis, Georgios, Basilios Gatos, Ioannis Pratikakis, and Constantin Halatsis. "Text line and word segmentation of handwritten documents." Pattern recognition 42, no. 12 (2009): 3169-3183.
13. Smith, R. "An overview of the Tesseract OCR engine." In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, 2007, pp. 629–633, IEEE.
14. Surinta, O., Holtkamp, M., Karabaa, F., Van Oosten, J.-P., Schomaker, L., & Wiering, M. "A Path Planning for Line Segmentation of Handwritten Documents." In 2014 14th International Conference on Frontiers in Handwriting Recognition, pages 175-180, IEEE, 2014.
15. Toiganbayeva, N., Kasem, M., Abdimanap, G., Bostanbekov, K., Abdallah, A., Alimova, A., & Nurseitov, D. "KOHTD: Kazakh offline handwritten text dataset." Signal Processing: Image Communication, vol. 108, pages 116827, Elsevier BV, Oct. 2022.
16. Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., & Schmidhuber, J. "A novel connectionist system for unconstrained handwriting recognition." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, 2009.
17. Voigtlaender, P., Doetsch, P., & Ney, H. "Handwriting Recognition with Large Multidimensional Long Short-Term Memory Recurrent Neural Networks." In 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 228–233, 2016.
18. Brunessaux, Sylvie, Patrick Giroux, Bruno Grilheres, Mathieu Manta, Maylis Bodin, Khalid Choukri, Olivier Galibert, and Juliette Kahn. "The maurdor project: Improving automatic processing of digital documents." In 2014 11th IAPR international workshop on document analysis systems, pp. 349-354. IEEE, 2014.
19. Long, S., He, J., Yao, C., Hu, W., Wang, Q., & Bai, X. "TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes." CoRR, vol. abs/1807.01544, 2018.
20. Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. "Character Region Awareness for Text Detection." CoRR, vol. abs/1904.01941, 2019.
21. Qu, C., Liu, C., Liu, Y., Chen, X., Peng, D., Guo, F., & Jin, L. "Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 5937–5946.
22. Jubaer, S. M., Tabassum, N., Rahman, M. A., & Islam, M. K. "BN-DRISHTI: Bangla Document Recognition Through Instance-Level Segmentation of Handwritten Text Images." In Document Analysis and Recognition – ICDAR 2023 Workshops, Mickael Coustaty and Alicia Fornés, Eds., Springer Nature Switzerland, Cham, pages 195–212, 2023.
23. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. "End-to-End Object Detection with Transformers." CoRR, vol. abs/2005.12872, 2020.
24. Han, K., Wang, Y., Chen, H., Chen, X., Guo, J., Liu, Z., Tang, Y., Xiao, A., Xu, C., Xu, Y. and Yang, Z., 2022. A survey on vision transformer. IEEE transactions on pattern analysis and machine intelligence, 45(1), pp.87-110.
25. Louloudis, Georgios, Basilios Gatos, Ioannis Pratikakis, and Constantin Halatsis. "Text line detection in handwritten documents." Pattern recognition 41, no. 12 (2008): 3758-3772.

There are 25 citations in total.

Details

Primary Language	English
Subjects	Image Processing, Pattern Recognition
Journal Section	Research Articles
Authors	Osman Furkan Karakuş 0000-0003-3017-7715 Ayla Gülcü 0000-0003-3258-8681 Ali Can Karaca 0000-0002-6835-7634
Early Pub Date	April 25, 2025
Publication Date	June 17, 2025
Submission Date	April 21, 2024
Acceptance Date	January 27, 2025
Published in Issue	Year 2025Volume: 9 Issue: 1

Cite

APA	Karakuş, O. F., Gülcü, A., & Karaca, A. C. (2025). Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task. Journal of Innovative Science and Engineering, 9(1), 28-38. https://doi.org/10.38088/jise.1471047
AMA	Karakuş OF, Gülcü A, Karaca AC. Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task. JISE. June 2025;9(1):28-38. doi:10.38088/jise.1471047
Chicago	Karakuş, Osman Furkan, Ayla Gülcü, and Ali Can Karaca. “Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task”. Journal of Innovative Science and Engineering 9, no. 1 (June 2025): 28-38. https://doi.org/10.38088/jise.1471047.
EndNote	Karakuş OF, Gülcü A, Karaca AC (June 1, 2025) Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task. Journal of Innovative Science and Engineering 9 1 28–38.
IEEE	O. F. Karakuş, A. Gülcü, and A. C. Karaca, “Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task”, JISE, vol. 9, no. 1, pp. 28–38, 2025, doi: 10.38088/jise.1471047.
ISNAD	Karakuş, Osman Furkan et al. “Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task”. Journal of Innovative Science and Engineering 9/1 (June 2025), 28-38. https://doi.org/10.38088/jise.1471047.
JAMA	Karakuş OF, Gülcü A, Karaca AC. Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task. JISE. 2025;9:28–38.
MLA	Karakuş, Osman Furkan et al. “Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task”. Journal of Innovative Science and Engineering, vol. 9, no. 1, 2025, pp. 28-38, doi:10.38088/jise.1471047.
Vancouver	Karakuş OF, Gülcü A, Karaca AC. Adapting Vision Transformer-Based Object Detection Model for Handwritten Text Line Segmentation Task. JISE. 2025;9(1):28-3.

Download Cover Image

Article Files

Full Text

Creative Commons License

The works published in Journal of Innovative Science and Engineering (JISE) are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.