Automated text extraction from video data through lip reading can overcome the language barrier and open the door of opportunities in terms of security, connectivity and physical challenges. The conversion is possible by analyzing facial expression using deep learning method. But this conversion is a challenging task due to the varieties of pronunciation and accents of the same word causing different countenance. In this research, a method of converting video data to text data through lip reading has been proposed. The proposed method includes test dataset, image frame analysis and having text output from identified words. In the proposed technique, the test dataset will be organized by combining all the possible facial expressions of different words.