Mobile parcel lockers (MPLs) have been recently introduced by urban logistics operators as a means to reduce traffic congestion and operational cost. Their capability to relocate their position during the day has the potential to improve customer accessibility and convenience (if deployed and planned accordingly), allowing customers to collect parcels at their preferred time among one of the multiple locations. This paper proposes an integer programming model to solve the Location Routing Problem for MPLs to determine the optimal configuration and locker routes. In solving this model, a Hybrid Q-Learning algorithm-based Method (HQM) integrated with global and local search mechanisms is developed, the performance of which is examined for different problem sizes and benchmarked with genetic algorithms. Furthermore, we introduced two route adjustment strategies to resolve stochastic events that may cause delays. The results show that HQM achieves 443.41% improvement on average in solution improvement, compared with the 94.91% improvement of heuristic counterparts, suggesting HQM enables a more efficient search for better solutions. Finally, we identify critical factors that contribute to service delays and investigate their effects.