Unsupervised anomaly detection in time-series data remains a critical challenge, particularly in domains where labeled anomalies are scarce and rapid decision-making is essential. In this paper, a new framework was proposed that combines temporal imaging with Vision Transformer-based Masked Autoencoders (ViT-MAE) in an attempt at fully unsupervised anomaly detection. One-dimensional or multivariate time-series signals are converted into two-dimensional recurrence plots, capturing both local and global temporal dynamics. A ViT-MAE is pretrained to reconstruct masked patches of these plots, learning robust, domain-agnostic embeddings without the need for labels. During inference, [CLS] token embeddings are extracted and compared against a Gaussian model fitted on normal data, with Mahalanobis distance used for anomaly scoring. We evaluate our framework on three diverse benchmarks: NYC-Taxi (univariate traffic data), Ambient Temperature (univariate environmental data), and the SKAB dataset (multivariate industrial sensor data). The method achieves AUROC scores of 0.945, 1.000, and 0.9273(SKAB valve1), respectively, and operates in under 5 milliseconds per window - suitable for real-time deployment. Sensitivity analyses reveal consistent performance across varying patch sizes and recurrence thresholds. Embedding space visualization confirms the semantic clustering of normal and anomalous windows. To the best of our knowledge, this framework is the first to integrate 2D recurrence plot encoding with ViT-MAE pretraining and Mahalanobis scoring, surpassing 1D signal-based methods like TS-MAE by capturing spatial-temporal patterns for robust, cross-domain anomaly detection.