Remote sensing (RS) technologies have significantly advanced Earth observation capabilities, enhancing the characterization and identification of surface materials through both spaceborne and airborne systems. These advancements are crucial for improving environmental monitoring and urban planning. As RS datasets have become more accessible, their increased complexity has necessitated a shift from traditional machine learning techniques to more robust deep learning approaches, particularly convolutional neural networks (CNNs) and transformer-based models known for their superior feature extraction capabilities. This systematic review focuses on the application of these deep learning techniques in land use classification, emphasizing the fusion of hyperspectral (HS) and LiDAR data. It critically examines the transition from traditional methods to advanced deep learning models, details comparative methodologies between different deep learning approaches, and discusses challenges in multimodal data fusion. The review also highlights potential areas for future research that can benefit researchers in developing robust and generalized techniques for land use classification.