Talk:K-medoids

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Dubious Claim in the Software Section[edit]

The comment related to RapidMiner states that the point closest to the mean is "not" the medoid. This statement is wrong! A proof for the closest point to be a medoid (a sample may have several!) is given in https://dx.doi.org/10.13140/2.1.4453.2009

Counterexample to your claim: 5 points in 1 dimension. 1,2,3,4,100. The medoid is 3 (2+1+0+1+97=101), the point closest to the mean is 4 (3+2+1+0+96=102). The claim from above - non-reviewed - report only works for squared (!) Euclidean distance, because the arithmetic mean is optimal then. But k-medoids makes most sense when *not* using Euclidean distance. If you are interested in L2², use k-means not k-medoids. 2A01:C23:692F:9800:95E5:9383:4A13:DA57 (talk) 15:21, 14 February 2024 (UTC)[reply]

Untitled[edit]

This article is very confusing and not a reliable explanation of Partitioning Around Medoids/K-Medoids. It needs to be edited and the explanation made clearer. —Preceding unsigned comment added by Kdambiec (talkcontribs) 08:52, 1 May 2008 (UTC)[reply]

Added explanation about how the cost is calculated. The Minkowski distance metric is used to calculate the distance by using an r value of 1. This is sometimes refered to as the City block distance or the Manhattan distance. --Xinunus (talk) 20:49, 22 June 2008 (UTC)[reply]

I didn't find it very confusing, although the article should be tidied up into proper paragraphs. This page desperately needs to reference the original academic article or at least a book. Also, I'm not sure that there is a single "k-medoids" algorithm, as a number of clustering methods use a k-medoids approach (eg PAM). Finally, if the Minkowski distance is the same as the Manhattan distance, isn't mentioning it redundant? Satyr9 (talk) 05:22, 28 July 2008 (UTC)[reply]

"Cost" is never defined in this article, despite being used extensively in the algorithm. 64.189.203.198 (talk) 03:13, 9 October 2019 (UTC)[reply]

=Wrong algorithm? Isn't this CLARANS (Clustering Large Application based on RANdomized Search) as proposed by Ng and Han? If I understand correctly, PAM should start with one (central) medoid and then add k-1 medoids if total cost can be reduced. Then all possible combinations of "switching" are calculated and carried out. This is of course incredibly slow for large amounts of data which is why CLARANS randomly selects one non-medoid per medoid and checks for a difference in cost resulting in a non-optimal result, if it terminates without having found the minimum cost/distance.

See page 56f in "Knowledge Discovery in Databases" by Martin Ester & Jörg Sander, Springer Verlag 2000 (German, sorry :p) —Preceding unsigned comment added by 95.90.189.48 (talk) 23:43, 16 April 2009 (UTC)[reply]

No, you are mistaken, k-medoids (PAM) is an analog of k-means and as such it already starts witk k medoids and no are added during the run of the algorithm. Tomash (talk) 17:22, 9 May 2012 (UTC)[reply]

Algorithm terminates here ?[edit]

Perhaps I misunderstood the procedure, but at the end of the example, wouldn't it go on selecting random O' till every non medoid had been tried for ? Obviously improvements in the other cluster might be possible (4,7 rather than 3,4 for example?). I'll admit though, that I don't know more about this algorithm than I've read here - but it would seem strange to me for the algorithm to terminate at the first check that doesn't give an improvement. Regards Sean Heron (talk) 10:35, 7 May 2009 (UTC).[reply]

Citation needed:[edit]

"It is more robust to noise and outliers as compared to k-means" is a strong claim. Without a good citation, I do not think it belongs in the article. 71.198.184.231 (talk) 20:43, 8 May 2009 (UTC)[reply]

I even think that claim is false. It is a property of the used distance/similarity measure and not of the clustering algorithm. --Sigbert (talk) 11:07, 3 May 2017 (UTC)[reply]
k-means optimizes squared errors, k-medoids was developed for L1, which is considered to be more robust to outliers in classic statistics (the median is more robust than the mean). That is likely where this old claim comes from. To find a citation, check the early literature. 2A01:C23:692F:9800:95E5:9383:4A13:DA57 (talk) 15:24, 14 February 2024 (UTC)[reply]

Error in the example[edit]

In the example, Figure 1.2 and 1.3 show X5 = (6,2) as outlier instead of being associated with cluster2 as in the explanation. 115.72.95.218 (talk) 11:08, 8 June 2014 (UTC)[reply]

Suggested Changes to Reflect New Work[edit]

  • Information to be added or removed: I would like to suggest that the changes in this diff be made to the page. This diff adds information about a new algorithm for the k-medoids problem, BanditPAM, to the Wikipedia article for k-medoids.
  • Explanation of issue: The existing algorithms described on the k-medoids page are now outdated. PAM, which is the most prominent algorithm in the Wikipedia article, was previously the best algorithm in terms of final clustering loss and was better than Voronoi iteration, CLARA, and CLARANS. However, it was very slow (O(n^2) in each iteration). BanditPAM is a new algorithm which returns the exact same results as PAM, but is much faster (O(nlogn) in each iteration). As such, BanditPAM is now the best existing k-medoids algorithm: it is better than Voronoi iteration, CLARA, and CLARANS in clustering loss and is also faster than PAM. This is important for practitioners to know when they are learning about the k-medoids problem; if I were a practitioner trying to perform k-medoids clustering, I would like to know what the best out-of-the-box algorithm to use is.
  • References supporting change: M. Tiwari, M. Zhang, J. Mayclin, S. Thrun, C. Piech, I. Shomorony (2020). "BanditPAM: Almost Linear Time k-medoids clustering via Multi-Armed Bandits". Advances in Neural Information Processing Systems 2020. Available at https://arxiv.org/abs/2006.06856.

Conflict of Interest Disclosure: I am the primary author of BanditPAM (Mo Tiwari).

Request to editor: The COI guidelines suggest that if an edit is verifiable and appropriate, it will usually be accepted. In this circumstance, I would argue that the suggested changes are both verifiable and appropriate. Motiwari (talk) 04:44, 14 December 2020 (UTC)[reply]

Reviewers do not verify runtime nor quality themselves. And many papers are never reproduced nor cited. Hence, please wait for **independent** verification that confirms the standard approach is obsolete. E.g., a meta study or a textbook for such bold claims such as "state-of-the-art". HelpUsStopSpam (talk) 20:43, 28 December 2020 (UTC)[reply]
Support with modification Oppose: The proposed source appears to be a scholarly work published at the 34th Conference on Neural Information Processing Systems. The poster says they are the author of the algorithm, but I do not see their name on the list of authors of the proposed citation. In other words, the provided source is an independent verification that does in fact confirm what the proposer wants to add as part of an in-depth scholarly analysis. However, I agree language like "state of the art" is too promotional with any source. CorporateM (Talk) 04:26, 15 February 2021 (UTC)[reply]
@CorporateM: check again, its not independent. The first author of the proposed source is "M. Tiwari". I'd wait for independent verification. HelpUsStopSpam (talk) 01:56, 16 February 2021 (UTC)[reply]
I stand corrected. Not sure how I missed that. CorporateM (Talk) 02:47, 16 February 2021 (UTC)[reply]

Closing this edit request as no consensus. If sources are found to support this request, it may be reopened. Altamel (talk) 04:13, 4 March 2021 (UTC)[reply]