Different benchmarks have played an important role in analysing algorithms for dynamic multi-objective optimisation problems. According to the literature, there are several benchmarks to deal with the dynamic multi-objective optimisation problems, especially in the evolutionary approaches. In this study, a comprehensive review has been done regarding the existing benchmarks in the single objective and multi-objective reinforcement learning (MORL) settings. To the best of our knowledge, there is no benchmark in the context of dynamic multiobjective reinforcement learning (DMORL). Therefore, this study has addressed this gap by applying the existing knowledge to propose a benchmark which may help to investigate the performance of different algorithms. It can also support to understand the dynamics while objectives are conflicting with each other and deal with the constraints and problem parameters that change over time. The proposed benchmark is the modified version of the deep-sea treasure hunt problem where several features such as changing parameters and objectives have been integrated to support the dynamics in a multi-objective environment. This paper highlights the methodology part of designing and developing a benchmark.