Individualized Passenger Travel Pattern Multi-Clustering Based on Graph Regularized Tensor Latent Dirichlet Allocation


Individual passenger travel patterns have significant value in understanding passenger’s behavior, such as learning the hidden clusters of locations, time, and passengers. The learned clusters further enable commercially beneficial actions such as customized services, promotions, data-driven urban-use planning, peak hour discovery, and so on. However, the individualized passenger modeling is very challenging for the following reasons: 1) The individual passenger travel data are multi-dimensional spatiotemporal big data, including at least the origin, destination, and time dimensions; 2) Moreover, individualized passenger travel patterns usually depend on the external environment, such as the distances and functions of locations, which are ignored in most current works. This work proposes a multi-clustering model to learn the latent clusters along the multiple dimensions of Origin, Destination, Time, and eventually, Passenger (ODT-P). We develop a graph-regularized tensor Latent Dirichlet Allocation (LDA) model by first extending the traditional LDA model into a tensor version and then applies to individual travel data. Then, the external information of stations is formulated as semantic graphs and incorporated as the Laplacian regularizations; Furthermore, to improve the model scalability when dealing with massive data, an online stochastic learning method based on tensorized variational Expectation-Maximization algorithm is developed. Finally, a case study based on passengers in the Hong Kong metro system is conducted and demonstrates that a better clustering performance is achieved compared to state-of-the-arts with the improvement in point-wise mutual information index and algorithm convergence speed by a factor of two.

Data Mining and Knowledge Discovery