Apps that care about our life making it more comfortable and safe, productive and secure.

High Resolution Multi-Scale Clustering with Stable State

Description

HiReCS is a C++ clustering library for the multi-scale hierarchical community structure discovery with crisp overlaps in large evolving networks having nearly linear computational complexity.

License: Commercial or Apache 2.0 (all code except the core which can be linked as a separate module and is under the MPL 2.0 license).
Please provide some feedback or a backlink to this page if you like or use this library.

Features

  • Performs Community Detection considering Crisp Overlaps by the weighted adjacency matrix for unsigned weighted directed graphs
  • Provides output of the multi-scale hierarchy of communities with overlaps and explicitly specified community cores (experimental feature that works purely at the moment) if exists (when links inside the community are not quasi-uniform)
  • Resulting hierarchy is stable (does not depends on the order of nodes processing and different runs of the algorithm unlike the Louvain method)
  • The hierarchy of communities is represented by the graph, not just a tree, because of the overlapping communities which are the multiple parents of the underlying shared nodes
  • The hierarchy has the highest range of granularity (resolution, scale) and contains coarse-grained communities on top levels, fine-grained on the middle levels and most dense parts of communities on the bottom levels that are not discoverable by standard legacy methods because of the resolution limit issue
  • The computational and memory complexity is near O(m), where m is the number of links in the input graph
  • The API is provided to directly unwrap any community up to the underlying nodes and displaying the share ratio E (0, 1], < 1 for overlaps
  • The API is provided to list all communities on the specified scale (level of the hierarchy)
  • The clustering algorithm is suitable for the distributed parallel implementation: each node is processed independently withing the context of 1 hop
  • Implementation is FREE / commercial, see the license. Download the clustering library

Usage

Library API manual is available also as hirecs client app to load an adjacency matrix from internal hig (HiReCS Input Graph) format. Hig format similar to the well known Pajek format. Hig format description and pajek to hig format converter are available in Downloads section:

datasets/hig_format.hig
datasets/pajek_hig.py

Client usage

$ ./hirecs
Usage: ./hirecs [-o{t,c,j}] [-f] [-r] [-m<float>] <adjacency_matrix.hig>
  -o  - output data format. Default: t
    t  - text like representation for logs
    c  - CSV like representation for parsing
    j  - JSON representation
    je  - extended JSON representation (j + unwrap root clusters to nodes)
    jd  - detaile JSON representation (je + show inter-cluster links)
  -c  - clean links, skip links validation
  -f  - fast quazy-mutual clustering (faster). Default: strictly-mutual (better)
  -r  - rand reorder (shuffle) nodes and links on nodes construction
  -m<float>  - modularity profit margin for early exit, float E [-1, 1]. Default: -0.999, but on practice >~= 0
    -1  - skip stderr tracing after each iteration. Recommended: 1E-6 or 0

Use ./hirecs without options to list this usage. See details in HiReCS_Usage.

API

API manual of the libHiReCS.

Sources

Sources of the libHiReCS.

Example

Input adjacency matix pentagon.hig:
/Graph weighted:1 normalize:0
/Nodes 5 0 # 5 nodes starting from id=0
/Edges
0> 1 2
3> 1 4
4> 2
Execute:
$ ./hirecs dataovp/pentagon.hig -oje -r
or
$ ./hirecs -oje dataovp/pentagon.hig
Output:
High Resolution Hierarchical Clustering with Stable State (HiReCS) started, nodes: 5, weight: 10
Initial state, mod: -0.2, nodes num: 5
  Initial nodes clustering is completed
  Folding is completed, mod: 0.05, dmod: 0.25, weight (both dir): 10, clusters num: 5
  Folding is completed, mod: 0.175, dmod: 0.125, weight (both dir): 10, clusters: 5 (cls: 5, prop: 0) / 10, dcls: 0
Hierarchy building is completed, mod: 0.175, weight (both dir): 10, clusters: 10, root size: 5
-Root size: 5
{"root":[5,6,7,8,9],"clusters":{"0":{"owners":[5,6],"des":[0,1],"leafs":true},"1":{"owners":[5,7],"des":[0,2],"leafs":true},"2":{"owners":[6,8],"des":[1,3],"leafs":true},"3":{"owners":[7,9],"des":[2,4],"leafs":true},"4":{"owners":[8,9],"des":[3,4],"leafs":true},"5":{"des":[0,1]},"6":{"des":[0,2]},"7":{"des":[1,3]},"8":{"des":[2,4]},"9":{"des":[3,4]}},"communities":{"5":{"0":0.5,"2":0.25,"1":0.25},"6":{"1":0.5,"3":0.25,"0":0.25},"7":{"2":0.5,"4":0.25,"0":0.25},"8":{"3":0.5,"4":0.25,"1":0.25},"9":{"3":0.25,"4":0.5,"2":0.25}},"nodes":5,"mod":0.175}

Example of the Hierarchical Community Structure for the gold standard, Zachary Karate Club network. HiReCS (r200) exactly hits the global optimum Q = 0.41979:

Hierarchical Community Structure of the Karate network

Downloads

WARNING! The library is under development now, it currently has issues with datasets processing of the size more than 100 000 links.
Currently it forms excellent root level (top-level, coarse-grained) clusters, but might have issues with the underlying levels of the hierarchy.
If you find any issues, please submit them here.

Get HiReCS sources and binaries for the particular release in GitHub: https://github.com/XI-lab/hirecs/releases.
Or get the documentation, datasets and HiReCS binaries (for Linux Ubuntu 14.04 x64) in GoogleDrive:

Discussions

If you are interested in this clustering library, please visit eXascale Infolab where you can find another projects and research papers related to Big Data!