reduced_mutual_information#
- graph_tool.inference.reduced_mutual_information(x, y, norm=False, norm_method='asymmetric')[source]#
- Returns the reduced mutual information between two partitions. - Parameters:
- xiterable of intvalues
- First partition. 
- yiterable of intvalues
- Second partition. 
- normbool(optional, default:False)
- If - True, the result will be pseudo-normalized such that returned value is one if the two partitions are identical.
- norm_method"symmetric"or"asymmetric"(optional, default:"asymmetric")
- Method used for pseudo-normalization. 
 
- xiterable of 
- Returns:
- RMIfloat
- Reduced mutual information value. 
 
- RMI
 - Notes - The reduced mutual information (RMI) [newman_improved_2020] is defined as \[\text{RMI}(\boldsymbol x, \boldsymbol y) = \frac{1}{N}\left[\ln \frac{N!\prod_{rs}m_{rs}!}{\prod_rn_r!\prod_sn_s'!} -\ln\Omega(\boldsymbol n, \boldsymbol n')\right],\]- with \(m_{rs}=\sum_i\delta_{x_i,r}\delta_{y_i,s}\) being the contingency table between \(\boldsymbol x\) and \(\boldsymbol y\), and \(n_r=\sum_sm_{rs}\) and \(n'_s=\sum_rm_{rs}\) are the group sizes in both partitions, and \(\Omega(\boldsymbol n, \boldsymbol n')\) is the total number of contingency tables with fixed row and column sums. - If - norm == True, the pseudo-normalized value is returned:\[\frac{\text{RMI}(\boldsymbol x, \boldsymbol y)}{\text{RMI}(\boldsymbol x, \boldsymbol x)}\]- if - norm_method == "asymmetric", or\[\frac{2\text{RMI}(\boldsymbol x, \boldsymbol y)}{\text{RMI}(\boldsymbol x, \boldsymbol x) + \text{RMI}(\boldsymbol y, \boldsymbol y)}\]- if - norm_method == "symmetric". In both cases the pseudo-normalized value will be unity if the two partitions are identical.- Warning - Note that both the pseudo-normalized and unormalized RMI values can be negative. In addition, the pseudo-normalized RMI can take values larger than unity for different partitions. These are issues with the definition of RMI, not bugs in this particular implementation. - This algorithm runs in time \(O(N)\) where \(N\) is the length of \(\boldsymbol x\) and \(\boldsymbol y\). - References [newman_improved_2020]- M. E. J. Newman, G. T. Cantwell and J.-G. Young, “Improved mutual information measure for classification and community detection”, Phys. Rev. E, 101, 042304 (2020), DOI: 10.1103/PhysRevE.101.042304 [sci-hub, @tor], arXiv: 1907.12581 - Examples - >>> x = np.random.randint(0, 10, 1000) >>> y = np.random.randint(0, 10, 1000) >>> gt.reduced_mutual_information(x, y) -0.069938... >>> x = [0, 1, 2, 3, 4, 5, 6] >>> y = [0, 0, 1, 0, 2, 3, 4] >>> gt.reduced_mutual_information(x, y, norm=True, norm_method="asymmetric") 0.535938... >>> # Pseudo-normalized values larger than one are possible! >>> gt.reduced_mutual_information(x, y, norm=True, norm_method="symmetric") 82.227299... >>> gt.reduced_mutual_information(x, x, norm=False) -0.162538... >>> gt.reduced_mutual_information(y, y, norm=False) 0.160420... >>> gt.reduced_mutual_information(x, x, norm=True, norm_method="asymmetric") 1.0 >>> gt.reduced_mutual_information(x, x, norm=True, norm_method="symmetric") 1.0 >>> gt.reduced_mutual_information(y, y, norm=True, norm_method="asymmetric") 1.0 >>> gt.reduced_mutual_information(y, y, norm=True, norm_method="symmetric") 1.0