Obtaining demographics of online users is of great significance to Internet service providers and advertisers. Most previous works used standard machine learning methods to infer user demographics from handcrafted features. This has two disadvantages. First, the handcrafted features are usually not robust and rely too much on expert experience. Second, these low-capacity models can neither model the complex non-linear relationship between users nor recognize interdependencies among items. To address these problems, we propose a DEep REtentive learning frameworK (DEREK) for demographic information prediction. Specifically, we introduce a heuristic data generation method that can alleviate data sparsity in order to use rating data more efficiently than the handcrafted features. Moreover, the retention blocks based on high-capacity deep neural networks are designed to extract a share representation from input rating data. Finally, the DEREK can simultaneously infer different demographic attributes through end-to-end multi-label learning architecture. The extensive experimental results on Movielens 100-k and Movielens 1-M data sets have demonstrated the superiority of the proposed DEREK compared with standard machine learning methods (logistic regression and SVM).