Scopus Indexed Publications

Paper Details


Title
Harnessing Computing Power for Machine Learning and Deep Learning: An In-Depth Review of Hardware, CUDA Core Architectures and their Impact on Performance

Author
, Md. Sadekur Rahman,

Email

Abstract

Machine Learning (ML) and Deep Learning (DL) have emerged as transformative paradigms in modern computing, underpinning applications from natural language processing to autonomous systems and medical diagnostics. Central to their success is the computational acceleration provided by Graphics Processing Units (GPUs), particularly those based on NVIDIA's CUDA architecture. This study presents an in-depth analysis of CUDA-enabled GPU architectures spanning from Fermi to Hopper and their impact on ML/DL performance, scalability, and energy efficiency. Through detailed comparisons with traditional CPUs and specialized Tensor Processing Units (TPUs), we highlight the evolution of CUDA cores, memory hierarchies, and profiling tools that enable high-throughput, low-latency AI computation. Empirical studies, including convolution-heavy CNN tasks and real-time inference on edge devices, demonstrate substantial speedups and efficiency gains. We also explore GPU-specific optimizations such as kernel fusion, warp scheduling, and memory coalescing, emphasizing their role in accelerating training and inference. This study offers valuable insights into the hardware-software co-design strategies essential for scaling future AI workloads.


Keywords

Journal or Conference Name
7th International Conference on Mobile Computing and Sustainable Informatics, ICMCSI 2026

Publication Year
2026

Indexing
scopus