Large-Scale Dimension Reduction with Both Global and Local Structure (2021) [pdf]

41 pointsposted 4 days ago
by chaosprint

9 Comments

gbickford

3 days ago

It's for visualizing datasets where fine-grained cluster details and broader relationships matter. There are example renderings in the paper.

From the README:

> PaCMAP (Pairwise Controlled Manifold Approximation) is a dimensionality reduction method that can be used for visualization, preserving both local and global structure of the data in original space. PaCMAP optimizes the low dimensional embedding using three kinds of pairs of points: neighbor pairs (pair_neighbors), mid-near pair (pair_MN), and further pairs (pair_FP).

> Previous dimensionality reduction techniques focus on either local structure (e.g. t-SNE, LargeVis and UMAP) or global structure (e.g. TriMAP), but not both, although with carefully tuning the parameter in their algorithms that controls the balance between global and local structure, which mainly adjusts the number of considered neighbors. Instead of considering more neighbors to attract for preserving glocal structure, PaCMAP dynamically uses a special group of pairs -- mid-near pairs, to first capture global structure and then refine local structure, which both preserve global and local structure. For a thorough background and discussion on this work, please read our paper.

szvsw

3 days ago

Very cool paper, code in the repo looks good and easy to use as well. After a quick skim of the paper, I feel like it suffers from a pretty common flaw (one which my PI often points out to me in my work, so I guess I’m just extra attuned to it right now): the authors make a pretty convincing argument (to me at least, but I’m more of an applied ML than theoretical ML person, so grain of salt) from a mathematical/methodological perspective that PaCMAP is better than common popular DR algorithms, and has various desirable properties in terms of simultaneous global/local scale preservation etc, but they more or less accept it as a given that we need better DR algorithms and that being better than the existing methods makes the work interesting, while failing to really convincingly illustrate actual use-cases where PaCMAP unlocks some sort of insight or delivers some sort of meaningful result that t-SNE and friends could not do.

I think doing so would be especially important in a paper on DR techniques which are already so fraught in how they are deployed (often with little thought) in many applied contexts, and when so much of their putative utility comes from their interaction with human visual perception. I would have loved to see some discussion of actual engineering use cases where PaCMAP proves more useful than t-SNE - I’m sure there are many! Really just nitpicking from me though, will probably try it out on my own cases in the next few days.

igorkraw

3 days ago

If you accept t sne or umap as being useful because it unveils some structure and pacmap has mathematical guarantees to preserve the same or more structure, is that not enough as a motivation?

Fwiw, I use pacmap when building pipelines to get a feel whether a model is capturing signals as expected, for which it works better than the two due to the structure preserving making the conceptual mapping easier

szvsw

3 days ago

I agree with you that it is correct from a logical perspective; I’m talking more about a narrative perspective. Any paper benefits from having at least one or two non-toy/bon-benchmark problems IMO.

r-zip

3 days ago

What mathematical guarantees? I don't see any mathematical results in the paper.

user

3 days ago

[deleted]

hoseja

3 days ago

Is it a coincidence that the mammoth projection looks like a cave painting?