The work starts with a question “Does human vocal folds produce different wavelength when they speak in different accent of same language?” Generally, when humans hear the language, they can easily classify the accent and region from the language. But the challenge was how we give this capability to the machine. By calculating discrete Fourier transform, Mel-spaced filter-bank and log filter-bank energies, we got Mel-frequency cepstral coefficients (MFCCs) of a voice which is the numeric representation of an analog signal. And then, we used different machine learning and deep learning algorithms to find the best possible accuracy. By detecting the region of speaker from voice, we can help security agencies and e-commerce marketing. Working with human natural language is a part of Natural Language Processing (NLP) which is branch of artificial intelligence. For feature extraction, we used MFCCs, and for classification, we used linear regression, decision tree, gradient boosting, random forest and neural network. And we got max 86% accuracy on 9303 data. The data was collected from eight different regions (Dhaka, Khulna, Barisal, Rajshahi, Sylhet, Chittagong, Mymensingh and Noakhali) of Bangladesh. We follow a simple workflow for getting the ultimate result.