Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic

Latif Akçay; Mustafa Alptekin Engin

doi:10.38088/jise.1712080

Research Article

Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic

Year 2025, Volume: 9 Issue: 2, 268 - 278

Latif Akçay , Mustafa Alptekin Engin

https://doi.org/10.38088/jise.1712080

Abstract

Digital signal processing applications are becoming increasingly important because modern systems work with much larger amounts of data than before. The Discrete Cosine Transform (DCT), used in almost all multimedia compression methods, creates a significant computational load especially in resource-constrained embedded systems. This study proposes four custom operations compatible with Transport-Triggered Architecture (TTA). To enhance computational efficiency and avoid floating-point overhead, fixed-point arithmetic is used. To analyse the effect of the proposed operations, different Application-Specific Instruction Set Processor (ASIP) configurations were created on a general-purpose processor architecture. Performance analyses show that speedups between 2x and 3.5x are achieved. In addition, the developed processor models have been implemented in hardware. FPGA synthesis results indicate a reasonable increase in chip area, showing that the proposed solutions could be an efficient alternative, particularly for limited-resource embedded systems.

Keywords

Discrete Cosine Transform , Efficient processor design , Signal processing , Transport-Triggered Architecture , Application-specific processor design

Ethical Statement

This study does not require ethics committee permission or any special permission.

Supporting Institution

Bayburt University Scientific Research Projects Coordination Unit

Project Number

2023/69002-01

References

[1] Ahmed, N., Natarajan, T., & Rao, K. R. (1974). Discrete cosine transform. IEEE Transactions on Computers. Institute of Electrical and Electronics Engineers, C–23(1), 90–93. https://doi.org/10.1109/t-c.1974.223784
[2] Pennebaker, W. B., & Mitchell, J. L. (1992). JPEG: Still Image Data Compression Standard. Springer.
[3] Furht, B. (2008). MPEG-2 Video Compression. In Encyclopedia of Multimedia (pp. 446–448). Springer US.
[4] Brandenburg, K. (1999). MP3 and AAC explained. Proceedings of the AES 17th International Conference on High Quality Audio Coding.
[5] Kusnadi, A., Pane, I. Z., & Tobing, F. A. T. (2025). Enhancing facial recognition accuracy through feature extractions and artificial neural networks. IAES International Journal of Artificial Intelligence (IJ-AI), 14(2), 1056. https://doi.org/10.11591/ijai.v14.i2.pp1056-1066
[6] Singh, A. K., & Krishnan, S. (2023). ECG signal feature extraction trends in methods and applications. Biomedical Engineering Online, 22(1), 22. https://doi.org/10.1186/s12938-023-01075-1
[7] Varghese, J., Bin Hussain, O., Subash, S., & T, A. R. (2023). An effective digital image watermarking scheme incorporating DCT, DFT and SVD transformations. PeerJ. Computer Science, 9, e1427. https://doi.org/10.7717/peerj-cs.1427.
[8] Duspara, A., Kovac, M., & Mlinaric, H. (2021). Discrete cosine transform hardware accelerator in parallel ultra-low power system. 2021 International Symposium ELMAR. IEEE.
[9] Pastuszak, G. (2015). Hardware architectures for the H.265/HEVC discrete cosine transform. IET Image Processing, 9(6), 468–477.
[10] Chen, J., Liu, S., Deng, G., & Rahardja, S. (2019). Hardware efficient integer discrete cosine transform for efficient image/video compression. IEEE Access: Practical Innovations, Open Solutions, 7, 152635–152645.
[11] Shabiul Islam, M., Salim Beg, M., Bhuyan, M. S., & Othman, M. (2006). Design and implementation of discrete cosine transform chip for digital consumer products. IEEE Transactions on Consumer Electronics, 52(3), 998–1003. https://doi.org/10.1109/tce.2006.1706499
[12] Nguyen, H., & John, L. K. (1999). Exploiting SIMD parallelism in DSP and multimedia algorithms using the AltiVec technology. Proceedings of the 13th International Conference on Supercomputing.
[13] Lee, R. B., Fiskiran, A. M., Shi, Z., & Yang, X. (2003). Refining instruction set architecture for high-performance multimedia processing in constrained environments. Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors.
[14] Hwang, Y.-T., & Huang, T.-H. (2012). Efficient TWIN-VQ audio decoder implementation on a configurable processor using instruction extension. 2012 IEEE International Symposium on Circuits and Systems.
[15] Kamal, M., Ghasemazar, A., Afzali-Kusha, A., & Pedram, M. (2014). Improving efficiency of extensible processors by using approximate custom instructions. Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014.
[16] Zhang, Y., Ou, H., & He, J. (2022). JPEG compression coding optimization based on NEON instruction. In F. Zhao (Ed.), 5th International Conference on Computer Information Science and Application Technology (CISAT 2022). SPIE.
[17] Haweel, R. T., El-Kilani, W. S., & Ramadan, H. H. (2016). Fast approximate DCT with GPU implementation for image compression. Journal of Visual Communication and Image Representation, 40, 357–365. https://doi.org/10.1016/j.jvcir.2016.07.003
[18] Chiper, D. F., & Dobrea, D. M. (2024). A novel low-complexity and parallel algorithm for DCT IV transform and its GPU implementation. Applied Sciences (Basel, Switzerland), 14(17), 7491. https://doi.org/10.3390/app14177491
[19] Agha, S., Jan, F., Khan, H. A., Kaleem, M., & Khan, M. (2024). Efficient motion estimation and discrete cosine transform implementation using the graphics processing units. PloS One, 19(8), e0307217. https://doi.org/10.1371/journal.pone.0307217
[20] Yang S., Wang Y., Li L, Qin J, Bi G. (2025). Implementation and Optimization of 8×8 Block Discrete Cosine Transform on MGPUSim. IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), 832–839.
[21] Keutzer, K., Malik, S., & Newton, A. R. (2003). From ASIC to ASIP: the next design discontinuity. Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.
[22] Ezer, G. A. (2005). Breaking the I/O bottleneck for high-compute performance processing with Xtensa LX configurable and extensible processor architecture. In S. Sudharsanan, V. M. Bove Jr, & S. Panchanathan (Eds.), Embedded Processors for Multimedia and Communications II. SPIE.
[23] Shahabuddin, S., Mammela, A., Juntti, M., & Silven, O. (2021). ASIP for 5G and Beyond: Opportunities and Vision. IEEE Transactions on Circuits and Systems. II, Express Briefs: A Publication of the IEEE Circuits and Systems Society, 68(3), 851–857. https://doi.org/10.1109/tcsii.2021.3050785
[24] Corporaal, H. (1997). Microprocessor architectures: From VLIW to TTA. John Wiley & Sons.
[25] Heikkinen, J., Sertamo, J., Rautiainen, T., & Takala, J. (2003). Design of transport triggered architecture processor for discrete cosine transform. 15th Annual IEEE International ASIC/SOC Conference.
[26] Shi, Z., Qiu, P., Wang, Y., Wang, S., & Guo, W. (2010). A Sub-band Synthesis Filter parallel processor based on Transport Trigger Architecture. 2010 3rd International Congress on Image and Signal Processing.
[27] Boutellier, J., Silven, O., & Raulet, M. (2011). Automatic synthesis of TTA processor networks from RVC-CAL dataflow programs. 2011 IEEE Workshop on Signal Processing Systems (SiPS).
[28] Yviquel, H., Boutellier, J., Raulet, M., & Casseau, E. (2013). Automated design of networks of transport-triggered architecture processors using dynamic dataflow programs. Signal Processing. Image Communication, 28(10), 1295–1302. https://doi.org/10.1016/j.image.2013.08.013
[29] Fisher, J. A. (1983). Very Long Instruction Word architectures and the ELI-512. Proceedings of the 10th Annual International Symposium on Computer Architecture - ISCA ’83.
[30] Hamalainen, P., Heikkinen, J., Hannikainen, M., & Hamalainen, T. D. (2005). Design of transport triggered architecture processors for wireless encryption. 8th Euromicro Conference on Digital System Design (DSD’05).
[31] Guo, J., Dai, K., & Wang, Z. (2006). A heterogeneous multi-core processor architecture for high performance computing. In Advances in Computer Systems Architecture (pp. 359–365). Springer Berlin Heidelberg.
[32] Akcay, L., & Ors, B. (2021). Custom TTA operations for accelerating kyber algorithm. 2021 13th International Conference on Electrical and Electronics Engineering (ELECO).
[33] Akçay, L., & Yalçın, B. Ö. (2022). Analysing the potential of transport triggered architecture for lattice-based cryptography algorithms. International Journal of Embedded Systems, 15(5), 404. https://doi.org/10.1504/ijes.2022.127164
[34] Akçay, L., & Yalçın, B. Ö. (2024). Lightweight ASIP design for lattice-based post-quantum cryptography algorithms. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-024-08976-w
[35] Safarpour, M., Hautala, I., Bordallo López, M., & Silvén, O. (2019). Transport triggered array processor for vision applications. In Lecture Notes in Computer Science (pp. 361–372). Springer International Publishing.
[36] Jääskeläinen, P., Tervo, A., Vayá, G. P., Viitanen, T., Behmann, N., Takala, J., & Blume, H. (2018). Transport-triggered soft cores. IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 83–90.
[37] Jääskeläinen, Pekka, Viitanen, T., Takala, J., & Berg, H. (2017). HW/SW co-design toolset for customization of exposed datapath processors. In Computing Platforms for Software-Defined Radio (pp. 147–164). Springer International Publishing.
[38] Armstrong, A., Bauereiss, T., Campbell, B., Reid, A., Gray, K. E., Norton, R. M., Mundkur, P., Wassell, M., French, J., Pulte, C., Flur, S., Stark, I., Krishnaswami, N., & Sewell, P. (2019). ISA semantics for ARMv8-a, RISC-v, and CHERI-MIPS. Proceedings of the ACM on Programming Languages, 3(POPL), 1–31. https://doi.org/10.1145/3290384
[39] Kim, S., Kum, K.-I., & Sung, W. (1998). Fixed-point optimization utility for C and C++ based digital signal processing programs. IEEE Transactions on Circuits and Systems II Analog and Digital Signal Processing, 45(11), 1455–1464. https://doi.org/10.1109/82.735357
[40] Chakraborty, S. (2017). Vivado Design Tools. In Designing with Xilinx® FPGAs (pp. 17–21). Springer International Publishing. https://doi.org/10.1109/access.2019.2947269
[41] Modi, H., & Athanas, P. (2015). In-system testing of Xilinx 7-Series FPGAs: Part 1-logic. IEEE Military Communications Conference MILCOM 2015.

There are 41 citations in total.

Details

Primary Language	English
Subjects	Image Processing, Digital Processor Architectures
Journal Section	Research Article
Authors	Latif Akçay 0000-0003-2580-2643 Mustafa Alptekin Engin 0000-0003-3399-9343
Project Number	2023/69002-01
Early Pub Date	October 10, 2025
Publication Date	December 9, 2025
Submission Date	June 2, 2025
Acceptance Date	July 31, 2025
Published in Issue	Year 2025 Volume: 9 Issue: 2

Cite

APA	Akçay, L., & Engin, M. A. (2025). Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic. Journal of Innovative Science and Engineering, 9(2), 268-278. https://doi.org/10.38088/jise.1712080
AMA	Akçay L, Engin MA. Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic. JISE. October 2025;9(2):268-278. doi:10.38088/jise.1712080
Chicago	Akçay, Latif, and Mustafa Alptekin Engin. “Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic”. Journal of Innovative Science and Engineering 9, no. 2 (October 2025): 268-78. https://doi.org/10.38088/jise.1712080.
EndNote	Akçay L, Engin MA (October 1, 2025) Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic. Journal of Innovative Science and Engineering 9 2 268–278.
IEEE	L. Akçay and M. A. Engin, “Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic”, JISE, vol. 9, no. 2, pp. 268–278, 2025, doi: 10.38088/jise.1712080.
ISNAD	Akçay, Latif - Engin, Mustafa Alptekin. “Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic”. Journal of Innovative Science and Engineering 9/2 (October2025), 268-278. https://doi.org/10.38088/jise.1712080.
JAMA	Akçay L, Engin MA. Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic. JISE. 2025;9:268–278.
MLA	Akçay, Latif and Mustafa Alptekin Engin. “Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic”. Journal of Innovative Science and Engineering, vol. 9, no. 2, 2025, pp. 268-7, doi:10.38088/jise.1712080.
Vancouver	Akçay L, Engin MA. Speeding up the Discrete Cosine Transform Through Custom Operations and Fixed-Point Arithmetic. JISE. 2025;9(2):268-7.

Download Cover Image

Article Files

Full Text

Creative Commons License

The works published in Journal of Innovative Science and Engineering (JISE) are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.