Community search

Discovering communities in a network, known as community detection/discovery, is a fundamental problem in network science, which attracted much attention in the past several decades. In recent years, with the tremendous studies on big data, another related but different problem, called community search, which aims to find the most likely community that contains the query node, has attracted great attention from both academic and industry areas. It is a query-dependent variant of the community detection problem. A detailed survey of community search can be found at ref.,^[1] which reviews all the recent studies ^[2]^[3]^[4]^[5]^[6]^[7]^[8] ^[9] ^[10] ^[11]

Main advantages[edit]

As pointed by the first work on community search^[2] published in SIGKDD'2010, many existing community detection/discovery methods consider the static community detection problem, where the graph needs to be partitioned a-priori with no reference to query nodes. While community search often focuses the most-likely communitie containing the query vertex. The main advantages of community search over community detection/discovery are listed as below:

(1) High personalization.^[3]^[9]^[10] Community detection/discovery often uses the same global criterion to decide whether a subgraph qualifies as a community. In other words, the criterion is fixed and predetermined. But in reality, communities for different vertices may have very different characteristics. Moreover, community search allows the query users to specify more personalized query conditions. In addition, the personalized query conditions enable the communities to be interpreted easily.

For example, a recent work,^[9] which focuses on attributed graphs, where nodes are often associated with some attributes like keyword, and tries to find the communities, called attributed communities, which exhibit both strong structure and keyword cohesiveness. The query users are allowed to specify a query node and some other query conditions: (1) a value, k, the minimum degree for the expected communities; and (2) a set of keywords, which control the semantic of the expected communities. The communities returned can be easily interpreted by the keywords shared by all the community members. More details can be found from.^[11]

(2) High efficiency. With the striking booming of social networks in recent years, there are many real big graphs. For example, the numbers of users in Facebook and Twitter are often billions-scale. As community detection/discovery often finds all the communities from an entire social network, this can be very costly and also time-consuming. In contrast, community search often works on a sub-graph, which is much efficient. Moreover, detecting all the communities from an entire social network is often unnecessary. For real applications like recommendation and social media markets, people often focus on some communities that they are really interested in, rather than all the communities.

Some recent studies^[4]^[9] have shown that, for million-scale graphs, community search often takes less than 1 second to find a well-defined community, which is generally much faster than many existing community detection/discovery methods. This also implies that, community search is more suitable for finding communities from big graphs.

(3) Support for dynamically evolving graphs.^[3] Almost all the graphs in real life are often evolving over time. Since community detection often uses the same global criterion to find communities, they are not sensitive of the updates of nodes and edges in graphs. In other words, the detected communities may loose freshness after a short period of time. On the contrary, community search can handle this easily since it is able to search the communities in an online manner, based on a query request.

Metrics for community search[edit]

Community search often uses some well-defined, fundamental graph metrics to formulate the cohesiveness of communities. The commonly used metrics are k-core (minimum degree),^[2]^[4]^[6]^[7]^[9] k-truss,^[5]^[8] k-edge-connected,^[12]^[13] etc. Among these measures, the k-core metric is the most popular one, and has been used in many recent studies as surveyed in.^[1]

References[edit]

^ ^a ^b Fang, Yixiang; Huang, Xin; Qin, Lu; Zhang, Ying; Zhang, Wenjie; Cheng, Reynold; Lin, Xuemin (2019). "A Survey of Community Search over Big Graphs". arXiv:1904.12539 [cs.DB].
^ ^a ^b ^c Sozio, Mauro; Gionis, Aristides (2010). "The community-search problem and how to plan a successful cocktail party". Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10. p. 939. doi:10.1145/1835804.1835923. ISBN 9781450300551. S2CID 11484255.
^ ^a ^b ^c Cui, Wanyun; Xiao, Yanghua; Wang, Haixun; Lu, Yiqi; Wang, Wei (2013). "Online search of overlapping communities". Proceedings of the 2013 international conference on Management of data - SIGMOD '13. p. 277. doi:10.1145/2463676.2463722. ISBN 9781450320375. S2CID 953025.
^ ^a ^b ^c Cui, Wanyun; Xiao, Yanghua; Wang, Haixun; Wang, Wei (2014). "Local search of communities in large graphs". Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. pp. 991–1002. doi:10.1145/2588555.2612179. ISBN 9781450323765. S2CID 4653380.
^ ^a ^b Huang, Xin; Cheng, Hong; Qin, Lu; Tian, Wentao; Yu, Jeffrey Xu (2014). "Querying k-truss community in large and dynamic graphs". Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. pp. 1311–1322. doi:10.1145/2588555.2610495. ISBN 9781450323765. S2CID 207211829.
^ ^a ^b Li, Rong-Hua; Qin, Lu; Yu, Jeffrey Xu; Mao, Rui (2015). "Influential community search in large networks". Proceedings of the VLDB Endowment. 8 (5): 509–520. CiteSeerX 10.1.1.667.4074. doi:10.14778/2735479.2735484. S2CID 17672355.
^ ^a ^b Barbieri, Nicola; Bonchi, Francesco; Galimberti, Edoardo; Gullo, Francesco (2015). "Efficient and effective community search". Data Mining and Knowledge Discovery. 29 (5): 1406–1433. doi:10.1007/s10618-015-0422-1. S2CID 13440433.
^ ^a ^b Huang, Xin; Lakshmanan, Laks V. S.; Yu, Jeffrey Xu; Cheng, Hong (2015). "Approximate closest community search in networks". Proceedings of the VLDB Endowment. 9 (4): 276–287. arXiv:1505.05956. doi:10.14778/2856318.2856323. S2CID 2905457.
^ ^a ^b ^c ^d ^e Fang, Yixiang; Cheng, Reynold; Luo, Siqiang; Hu, Jiafeng (2016). "Effective community search for large attributed graphs". Proceedings of the VLDB Endowment. 9 (12): 1233–1244. doi:10.14778/2994509.2994538. hdl:10722/232839.
^ ^a ^b Fang, Yixiang; Cheng, Reynold; Li, Xiaodong; Luo, Siqiang; Hu, Jiafeng (2017). "Effective community search over large spatial graphs". Proceedings of the VLDB Endowment. 10 (6): 709–720. doi:10.14778/3055330.3055337. hdl:10722/243528.
^ ^a ^b "Effective Community Search for Large Attributed Graphs".
^ Chang, Lijun; Lin, Xuemin; Qin, Lu; Yu, Jeffrey Xu; Zhang, Wenjie (2015). "Index-based Optimal Algorithms for Computing Steiner Components with Maximum Connectivity". Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 459–474. doi:10.1145/2723372.2746486. ISBN 9781450327589. S2CID 18282516.
^ Hu, Jiafeng; Wu, Xiaowei; Cheng, Reynold; Luo, Siqiang; Fang, Yixiang (2017). "On Minimal Steiner Maximum-Connected Subgraph Queries". IEEE Transactions on Knowledge and Data Engineering. 29 (11): 2455–2469. doi:10.1109/TKDE.2017.2730873. S2CID 40432915.

[survey-1] Fang, Yixiang; Huang, Xin; Qin, Lu; Zhang, Ying; Zhang, Wenjie; Cheng, Reynold; Lin, Xuemin (2019). "A Survey of Community Search over Big Graphs". arXiv:1904.12539 [cs.DB].

[KDD2010-2] Sozio, Mauro; Gionis, Aristides (2010). "The community-search problem and how to plan a successful cocktail party". Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '10. p. 939. doi:10.1145/1835804.1835923. ISBN 9781450300551. S2CID 11484255.

[SIGMOD2013-3] Cui, Wanyun; Xiao, Yanghua; Wang, Haixun; Lu, Yiqi; Wang, Wei (2013). "Online search of overlapping communities". Proceedings of the 2013 international conference on Management of data - SIGMOD '13. p. 277. doi:10.1145/2463676.2463722. ISBN 9781450320375. S2CID 953025.

[SIGMOD2014-cui-4] Cui, Wanyun; Xiao, Yanghua; Wang, Haixun; Wang, Wei (2014). "Local search of communities in large graphs". Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. pp. 991–1002. doi:10.1145/2588555.2612179. ISBN 9781450323765. S2CID 4653380.

[SIGMOMD2014-huang-5] Huang, Xin; Cheng, Hong; Qin, Lu; Tian, Wentao; Yu, Jeffrey Xu (2014). "Querying k-truss community in large and dynamic graphs". Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. pp. 1311–1322. doi:10.1145/2588555.2610495. ISBN 9781450323765. S2CID 207211829.

[VLDB2015-6] Li, Rong-Hua; Qin, Lu; Yu, Jeffrey Xu; Mao, Rui (2015). "Influential community search in large networks". Proceedings of the VLDB Endowment. 8 (5): 509–520. CiteSeerX 10.1.1.667.4074. doi:10.14778/2735479.2735484. S2CID 17672355.

[DMKD2015-7] Barbieri, Nicola; Bonchi, Francesco; Galimberti, Edoardo; Gullo, Francesco (2015). "Efficient and effective community search". Data Mining and Knowledge Discovery. 29 (5): 1406–1433. doi:10.1007/s10618-015-0422-1. S2CID 13440433.

[vldbHuang-8] Huang, Xin; Lakshmanan, Laks V. S.; Yu, Jeffrey Xu; Cheng, Hong (2015). "Approximate closest community search in networks". Proceedings of the VLDB Endowment. 9 (4): 276–287. arXiv:1505.05956. doi:10.14778/2856318.2856323. S2CID 2905457.

[vldbFang-9] Fang, Yixiang; Cheng, Reynold; Luo, Siqiang; Hu, Jiafeng (2016). "Effective community search for large attributed graphs". Proceedings of the VLDB Endowment. 9 (12): 1233–1244. doi:10.14778/2994509.2994538. hdl:10722/232839.

[vldbFang2017-10] Fang, Yixiang; Cheng, Reynold; Li, Xiaodong; Luo, Siqiang; Hu, Jiafeng (2017). "Effective community search over large spatial graphs". Proceedings of the VLDB Endowment. 10 (6): 709–720. doi:10.14778/3055330.3055337. hdl:10722/243528.

[vldbFangHomepage-11] "Effective Community Search for Large Attributed Graphs".

[SIGMOD2015-12] Chang, Lijun; Lin, Xuemin; Qin, Lu; Yu, Jeffrey Xu; Zhang, Wenjie (2015). "Index-based Optimal Algorithms for Computing Steiner Components with Maximum Connectivity". Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. pp. 459–474. doi:10.1145/2723372.2746486. ISBN 9781450327589. S2CID 18282516.

[TKDE2017-13] Hu, Jiafeng; Wu, Xiaowei; Cheng, Reynold; Luo, Siqiang; Fang, Yixiang (2017). "On Minimal Steiner Maximum-Connected Subgraph Queries". IEEE Transactions on Knowledge and Data Engineering. 29 (11): 2455–2469. doi:10.1109/TKDE.2017.2730873. S2CID 40432915.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]