hackernews client

The Skyline algorithm for packing 2D rectangles

223 pointsposted a year ago

59 Comments

romesmoke

a year ago

An interesting variation is Dynamic Storage Allocation, in which the rectangles can slide only alongside one axis.

AKA "static memory allocation", since the heights of the rectangles can be interpreted as buffer sizes, and their widths as lifetimes.

Most of my PhD's effort has been devoted to beating the SOTA in this. Problem's importance nowadays is owed to deep learning's memory wall.

almostgotcaught

a year ago

> Most of my PhD's effort has been devoted to beating the SOTA in this. Problem's importance nowadays is owed to deep learning's memory wall.

lol i similarly spent 1 year of my phd trying to finagle this same thing for the same exact reason. got as far as proving that when it's parameterized (like in the case of DNNs with algebraically coupled layers) it's FPT (finite parameter tractable) which is like duh but also as soon as i found that i gave up (not much more to be done <shrug>).

romesmoke

a year ago

Hm, you sound like you approached it from the theoretical ML side. My strategy was much less sophisticated: I implemented the Buchsbaum et al. paper you posted in another comment of yours.

almostgotcaught

a year ago

I don't know what you mean. I studied the problem, implemented several papers, wasn't happy with the results, and then kept pushing on the actual complexity. When I figured out there was an existing bound I gave up because I wasn't a theory student so I didn't have time (or really desire) to try to improve on the bound.

romesmoke

a year ago

TBH I don't know what you mean either :D

I know that DSA is NP-complete, and none of the existing deep learning compilers implement the theoretical SOTA (you're probably referring to the 2+ε bound by Buchsbaum et al.)

I also know that scaling laws and the memory wall are soon gonna overpower the currently used heuristics (e.g., sort by size and do best-fit).

Anyways, I'm glad I "met" someone who has struggled against the same problem. Good luck with your research!

almostgotcaught

a year ago

> TBH I don't know what you mean either :D

Think about the ILP with disjunctions formulation of DSA. It's basically a set of linear equations right? In the case of DNNs, the equations are related: the sizes of the allocations in one layer (in say a CNN) are algebraically a function of the outputs of the previous layer. And so on all the way back to the inputs. So what looks like a set of linear equations in N variables is actually a much smaller set of nonlinear equations in K << N variables.

Okay non-linear equations are hard but that's not actually what I'm getting at - what I'm getting at is the instances of the DSA problem in a production ML pipeline are parametric. So I tried to solve this problem: given a DNN and its parameterized DSA instances, can you do some work offline (to solve "part" of the problem) so that at runtime there's not a lot of work left. And that question/problem/challenge falls into the bucket of FPT.

> Good luck with your research

I changed topics and graduated. No more research for me yay

romesmoke

a year ago

Wait a minute:

1. Formulating DSA as ILP doesn't scale. TelaMalloc starts from this fact as motivation.

2. There's no 1-1 mapping between tensors and allocations. Buffer re-use is important.

3. There's room for offline DSA instances as well. IREE's Stream dialect has a related transform (LayoutSlices.cpp).

Anyways, I'm no ML expert. DSA is much more generic than deep learning compilers. I can't wait to graduate myself and never hear the involved keywords again.

almostgotcaught

a year ago

> Formulating DSA as ILP doesn't scale.

it doesn't scale to what? 405B LLMs? no probably not. but i have plots that show not unreasonable solve times for CNNs with ~30k tensors (~10 minutes) using gurobi.

> There's no 1-1 mapping between tensors and allocations. Buffer re-use is important.

yes i'm aware... that's why I made the comment above that you can't pull this trick in e.g., PyTorch.

> 3. There's room for offline DSA instances as well. IREE's Stream dialect has a related transform (LayoutSlices.cpp).

yes again, i'm aware - that's why I made the comment above that IREE is the only place you could pull this trick. LayoutSlices is one place but hooking the allocator in the HAL is simpler if you don't want to fight IREE's various transformations that happen after that.

> DSA is much more generic than deep learning compilers.

yes that's I posted the OR literature first...

> I can't wait to graduate myself and never hear the involved keywords again.

amen

auvi

a year ago

nice! you have your thesis or any related paper available online?

almostgotcaught

a year ago

there are a bunch of papers on DSA from the OR people

http://adambuchsbaum.com/papers/dsa-stoc03.pdf

https://link.springer.com/chapter/10.1007/978-3-540-27798-9_...

https://users.cs.northwestern.edu/~pdinda/ics-s05/doc/dsa.pd...

there are also some from ML people trying to allocate memory optimally for DNNs:

https://arxiv.org/abs/1907.01989

https://arxiv.org/abs/1804.10001

https://arxiv.org/abs/2001.03288

they all boil down to a couple of greedy heuristics. the most recent "cool" paper was from a group at google

https://dl.acm.org/doi/10.1145/3567955.3567961

basic idea is to use both ILP and heurstics. i asked them to open source but no dice :(

p0ckets

a year ago

An even more recent one from Google: https://github.com/google/minimalloc

almostgotcaught

a year ago

i gave up on the problem about 18 months ago so i didn't keep up with the research area. is this yours? the runtimes are of course very good but i don't see a comparison on how good the approximation is vs telamalloc (or just ILP). i'll say this though: it's miraculuos that the impl is so small.

p0ckets

a year ago

Not mine. I just happened to hear about it from a colleague recently.

romesmoke

a year ago

Hopefully I'll defend next summer.

Papers-and-code-wise, I haven't published the "final" thing yet. Right now I'm looking into integrating it with IREE, to get some end-to-end figures.

almostgotcaught

a year ago

> Right now I'm looking into integrating it with IREE

clever guy. IREE is just about the only serious/available runtime where you can do this because IREE codegens the runtime calls as well as the kernel code. But you're gonna have to either patch an existing HAL or write a new one to accomplish what you want to accomplish. If you want I can help you - if you go to the discord (https://discord.gg/J68usspH) and ask about this in #offtopic I'll DM you and can point you to the right places.

user

a year ago

[deleted]

alex_suzuki

a year ago

Nice, I didn’t know stb had a rect packer: https://github.com/nothings/stb/blob/master/stb_rect_pack.h

clippy99

a year ago

Great writeup. Sucks when this gets asked in a coding interview to be solved in 15min :)

mikepurvis

a year ago

To be fair, in an interview context they’re probably looking for what you would implement in the mvp just to avoid getting blocked, and a brief acknowledgement that it’s an academic problem and once you have data to understand it better you’ll select and implement an appropriate algorithm after design review with colleagues.

guessmyname

a year ago

In theory, yes, but in practice...

I’ve interviewed with several companies that asked me this specific question [1], including Facebook, ByteDance, LinkedIn, and a particular team at Apple (not my current team). The interviewers, perhaps somewhat optimistically [2], expected a fully working solution. They gave me about 40 minutes—more than the 15 minutes mentioned in the original comment—but I definitely needed the first 10-15 minutes just to get a brute-force solution running. The rest of the time was spent refining the approach and addressing 1-2 additional requirements to pass a set of visible tests.

It was challenging, but not in a traditional engineering sense. It felt more like an ACM competition [3].

Fortunately, programming skills aren’t the only thing companies assess these days. With over a decade of work experience, behavioral (experience-based) interviews now play a larger role in the final hiring decision. That said, depending on who conducts the technical portion of the interview, you could still be rejected if your code doesn’t work.

[1] https://leetcode.com/problems/the-skyline-problem/descriptio...

[2] Them, being so young (<10 YoE), consider LeetCode a panacea

[3] https://en.wikipedia.org/wiki/Competitive_programming

mikepurvis

a year ago

Blah, that sucks. I happily did a take home assignment recently for an interview and enjoyed digging into it, but if asked to solve such a generic brain teaser under a time constraints I’d probably just walk out— at 38 I’m too old to be playing those games.

imajoredinecon

a year ago

“This exact problem” isn’t the packing problem discussed in the OP, right? It’s a different problem about taking the union of overlapping rectangles with their bottom edges at the same vertical position.

zellyn

a year ago

Reading just the headline, I thought that's what this post was referring to. I used to give the Skyline overlapping rectangle problem at Google for several years, until it got banned. I'd say that with appropriate nudges, most candidates were able to get through it in an hour. I don't know how things have changed since 2015, but this was all on a whiteboard, so at least there wasn't time wasted fighting a compiler.

Then again, my lifetime stats on interviewing at Google, measured by interview scores vs eventual offers extended was somewhere between noise and a slightly negative correlation, so I never did figure out why they let me interview at all! (I think I'm too nice in interviews, because I want everyone to succeed, and asking myself "if this person were on my team, would I be as happy to collaborate with them as I would be with Nick? Really?" only goes so far in counteracting that.)

genewitch

a year ago

thanks for linking that leetcode link; the entire time i've been reading about it i thought it was a separate programming language used for these "challenges" at workplaces. I'm not a programmer by trade so i was surprised to see python is usable, too.

Sorry for asking you (i'm not, just shouting into the ethernet) - when people say they take a few months off to "grind leetcode" do they just go through the problems on that site? When an employer issues a challenge it's going to be exactly worded as when you practiced? like, the potential employer in your experience gave you that URL and said "go"?

lentil_soup

a year ago

Not OP but yeah, they mean to work your way through the problems there and yes most "normal" languages are fine, python, js, C, java, etc.

In my experience they won't just give you an URL, it'll just be a problem in the style of the ones there. Thing is, once you've done a few of them you'll see that there's only a set of algorithms that usually apply. The setting and input will vary but in the end you can reduce it to a known algorithm or technic so it's more about finding which one fits and can solve the problem

delibes

a year ago

You can probably get denser packing (at least in theory), if you allow rotation. It's just ugly:

https://kingbird.myphotos.cc/packing/squares_in_squares.html

mzl

a year ago

The standard heuristic any 2D packing algorithm starts with is left-bottom or bottom-left (the choice is symmetrical). Place a piece as much to the left as possible, and among equal choicse as low as possible (or vice versa, as the choice was in the linked article). From this base heuristic, further meta-heuristics can be developed with choices for the order to pack parts, how to choose among "almost equal" positions, choosing among rotations and symmetries, and so on.

As far as I can tell, the "Skyline algorithm" is bottom-left with the limitation that created holes can not be packed. This doesn't feel like a packing algorithm as much as it feels like a heuristic to speed up position finding.

RaftPeople

a year ago

I created a 3D bin packing at work with similar basic heuristic you described, plus some additional flavor for testing some number of different combinations during each iteration (and choosing the path that had best results up to that point).

The interesting thing I found is that the simple heuristics work pretty well. I actually had to dial back the level of optimization because it took too long for the human packers to duplicate the packing correctly to be able to close the carton.

I had to set it to a level that retained enough wasted space that allowed them quickly find their own solution, but knowing that for sure all of this stuff will fit into carton size X, because saving an extra few dollars in shipping didn't offset the labor dollars to pack it that way.

antirez

a year ago

I wonder how well Montecarlo works with a problem like this (for the online version), where you try all the potential places and run simulations adding new random boxes similar to the one added so far in order to see which option allows for better outcome. For sure it's slow, yet it should work well, which is interesting per-se since it would show once more how you can come up with slow but good solutions just simulating stuff. But likely, this is how this heuristics, like Skyline, were found: analyzing the properties of the brute-forced best picks.

theOGognf

a year ago

They do this in subsets of aerodynamics for optimizing the tradeoff between simulation speed and fidelity in cases where there will be a lot of runs with the same geometry but different parameters

swayvil

a year ago

I wonder how good random placement would work.

IE :

Create a number of compositions where you put everything in random locations.

Discard any compositions with overlaps.

Keep the ones with the smallest wasted space or whatever.

Also, this might be a good application for AI.

hammock

a year ago

There is a machine in a certain wood window manufacturer's plant that I have toured, which uses lasers to measure odd pieces of wood and then table saws to cut them into necessary sized pieces, minimizing waste - even cuts for fingerjoints - it all happens on a moving belt, and it happens extremely fast and in real-time. I was impressed. Probably powered by something like this algorithm (or the MAXRECTS one)

genewitch

a year ago

Was it Jeldwin? I still have nightmares about the sound of their saw during the winter months in OR.

reader9274

a year ago

I implemented a variation of this at Intel about 10 years ago to solve the efficient testing of microchips with different input/output pins on a limited-pin in/out tester. The goal was to efficiently load the chips in the tester to maximize utilization, with TT of each chip being the same. Fun times, heuristic-solving my way through the NP bin packing problem.

solomonb

a year ago

One of my first big personal programming projects was implementing this and a bunch of variations in python many years ago: https://github.com/solomon-b/greedypacker

Its still my most starred repo. :shrug:

thanatos519

a year ago

If you're printing randomly sized rectangles and want to cut them apart with a guillotine paper cutter, then the algorithm can be much simpler: https://github.com/trathborne/guillot

TacticalCoder

a year ago

Is this an algorithm in the family of "2D bin packing"?

As for packing sprites for games... I remember the fun on very constrained devices where many of the sprites where actually more round than square. So it was about packing both rectangles (for tiles and "rectangle" sprites) and packing circles too. The size of the spritesheet had to be kept small but in memory things were okay'ish (as in: a "round" sprite could be extracted from the spritesheet in its own rectangle).

Thankfully thing is: this can be done with a much more powerful machine than the device the sprite sheet shall eventually be on.

mjevans

a year ago

This blog post references https://github.com/juj/RectangleBinPack/blob/master/Rectangl... ( and implicitly https://github.com/juj/RectangleBinPack/ ) as it's primary source.

The README even mentions the guillotine algorithm / method that someone else posted (not the same link, but the same method).

starmole

a year ago

I always thought this was called the Tetris packing algorithm. Because you drop in pieces from the top like in a game of Tetris. (no sliding in sideways allowed :))

jkaptur

a year ago

Very fun. I wonder if a heap could improve the performance of the "loop through the skyline in order to find the best candidate location" step. (I think it would improve the asymptotic time, if I'm understanding the algorithm correctly, but obviously the real performance is more complex).

chatmasta

a year ago

With some clever bit twiddling hacks, you might be able to avoid the loop entirely.

nsxwolf

a year ago

Can't wait to ask this one as a Leetcode warmup.

taeric

a year ago

Fun write up, kudos!

At a glance, this feels like a good case for an exact cover algorithm. Would be neat to see how that compares, here.

proee

a year ago

This looks useful for auto-placing parts inside a PCB.

sumtechguy

a year ago

Packing has so many applications for different things. Such as optimal packing of a truck or shipping container. Or how to optimally pack graphics into a GPU texture.

I had this guy as a prof. https://en.wikipedia.org/wiki/David_A._Klarner I have never encountered someone so excited about dividing up rectangles, as it is related to combinatorics. Also with such a seething hatred for floating point numbers.

db48x

a year ago

Packing things into a GPU texture is probably the #1 most common use. If you played a game today then your computer probably spent some time packing rectangles into rectangles.

joshtynjala

a year ago

Your web browser may have also spent some time packing rectangles into rectangles. I recall reading this article a while back about how Firefox does/did it.

https://mozillagfx.wordpress.com/2021/02/04/improving-textur...

db48x

a year ago

That’s great; I’d forgotten about it.

Plus there are a fair few programs that render text by rendering the font into a texture atlas once and then using the GPU to copy from the atlas. Your terminal emulator may be doing this, for example.

sumtechguy

a year ago

The truck packing thing is one thing he was proud of. Last time I saw him he was trying to work it out for 3d. But for that particular use case many companies found that optimal packing is not necessarily the best use of time (as some value speed over total shipping volume). Sometimes just stacking it up and leaving some slack was faster. Wonder if the same sort of issues show up in GPU texture packing? I would suppose that could only pop up in some edge cases and only if you are building the textures at runtime.

pineaux

a year ago

3d printing build plate optimalisation, passenger transport, real estate plot division. Let's make this list longer?

amenghra

a year ago

A long time ago, some websites used to pack their images and use css to display a piece of the larger image. It reduced the number of round trips by not having to fetch a bunch of small files.

qingcharles

a year ago

This "sprite sheet" style hack is still used sometimes.

I guess it's not really that necessary now that most sites load 25MB of JavaScript and a 250MB video that plays onload.

amenghra

a year ago

HTTP/2's multiplexing is probably the main reason we don't need sprite sheets anymore. I guess icon fonts are the modern sprite sheet to some degree.

pavel_lishin

a year ago

Hell, I needed something like this when working on my Halloween costumes this year!

hermitcrab

a year ago

There are all sorts of application for this. E.g. optimally cutting carpets from rolls of carpet. It is a variant on the 2D knapsack problem, one of the classic operational research problems: https://en.wikipedia.org/wiki/Knapsack_problem

The knapsack problem gets much harder as you increase the dimensions.

Placing parts on a PCB is harder if you have to care about where the components go relative to each other (e.g. because they have to be electrically connected, a certain distance from ink marking or drill holes, due to thermal or interference issues etc) rather than just optimizing space used.

TheRealNGenius

a year ago

[dead]

Const-me

a year ago

Yeah, that algorithm is pretty good.

BTW, when I needed it, I have ported C algorithm from fontstash library to C++: https://github.com/Const-me/nanovg/blob/master/src/FontStash...

rose987a

a year ago

[dead]

cautifliape2002

a year ago

[dead]