Recent technological advancements and investments have transformed Unmanned Aerial Vehicles (UAVs) into a credible and reliable tool for the provision of on-demand last-mile logistics services. Nevertheless, few studies have developed integrated task assignment and path planning models that consider dynamic environments and stochastic demand generation. This paper addresses this research gap by developing a reinforcement learning path planning approach, coupled with a task assignment model formulated as a mixed-integer programming problem. The performance of task assignment model is evaluated against a dynamic programming method, and a First-In-First-Out heuristic which serves as the baseline. A case study based on the City of London is proposed to demonstrate the applicability of the integrated model. Results demonstrate the effectiveness of the mixed-integer approach in coordinating the UAV fleet compared to the other methods, with the dynamic programming providing higher returns for large fleet sizes.