Clustering is the task of partitioning objects into clusters on the basis of certain criteria so that objects in the same cluster are similar. Many clustering methods have been proposed in a number of decades. Since clustering results depend on criteria and algorithms, appropriate selection of them is an essential problem. Recently, large sets of users’ behavior logs and text documents are common. These are often presented as high‐dimensional and sparse vectors. This chapter introduces information‐theoretic clustering (ITC), which is appropriate and useful to analyze such a high‐dimensional data, from both theoretical and experimental side. Theoretically, the criterion, generative models, and novel algorithms are shown. Experimentally, it shows the effectiveness and usefulness of ITC for text analysis as an important example.
Part of the book: Advances in Statistical Methodologies and Their Application to Real Problems
Topic models are known to be useful tools for modeling and analyzing high-dimensional count data such as documents. In a smart city, it is important to collect and analyze citizens’ voices to discover their concerns and issues. Topic modeling is effective for the above analysis because it can extract topics from a collection of documents. However, when estimating parameters (solutions) in topic models, various solutions are reached due to differences in algorithms and initial values. In order to select a solution suitable for the purpose from among the various solutions, it is necessary to know what kind of solutions exist. This chapter introduces methods for analyzing diverse solutions and obtaining an overall picture of the solutions.
Part of the book: Sustainable Smart Cities