Samurai: Adapting Segment Anything Model for Zero-Shot Visual Tracking

93 pointsposted a month ago
by GordonS

11 Comments

steinvakt2

24 days ago

Note that this currently only enables single object tracking. Tried it for my research project (tracking cells on microscopic videos) but it didn't work well. Guess it's more suited for real world 3d scenarios

IshKebab

24 days ago

Very impressive. I wish research like this was more deployable. It always seems to come in the form of a muddy ball of Python, rather than e.g. a C++ or Rust library you could actually deploy in a product.

I get why, but it still seems a shame that there's all this cool ML research that will only make it into actual products in 10 years when someone with the resources of Adobe rewrites it in something other than Python.

HanClinto

24 days ago

I work on deployed embedded ML products using NVidia Jetson, and while there are C++ portions, a lot of it (dare I say most of it?) is written in Python. It's fast enough for our embedded processors, and Docker containers makes such things very deployable -- even in relatively resource-constrained environments. No, we're not on a Raspberry Pi or an Arduino, but I don't think that SAM2 is going to squeeze down reasonably onto something that size anyways.

If the inference code (TensorRT, Tensorflow, Pytorch, whatever) is fast, then what does it matter what the glue code is written in?

Python has become the common vulgate as a trade language between various disciplines, and I'm all 'bout that.

I've only been working in computer vision for 10-ish years, but even when I started, most research projects were in Matlab. The fact that universities have shifted away from Matlab and into Python is a breath of fresh air, lemme' tell ya'.

stefan_

24 days ago

> a lot of it (dare I say most of it?) is written in Python

I guess ignorance is bliss once someone has done the work for you of getting it all down into TRT.

IshKebab

23 days ago

It matters because Python tooling is terrible enough that you need to resort to Docker, and Docker is only a viable solution in some circumstances - deploying on Linux servers.

Think about deploying this in a desktop application. You aren't going to ship a Docker container there.

steinvakt2

23 days ago

I think the upside of using Python as a general language for AI (discussion, development, ship fast, constant iteration...) is bigger than the downside of more hassle when deploying

IshKebab

23 days ago

Yeah the REPL / notebook feature of Python is one of the few genuine advantages it has. Even though the actual REPL was pretty awful until very recently, it did at least exist and work which is more than most languages can say.

Grosvenor

24 days ago

TIL Vulgate was a Latin version of the bible.

From Apple dictionary:

"the principal Latin version of the Bible, prepared mainly by St. Jerome in the late 4th century, and (as revised in 1592) adopted as the official text for the Roman Catholic Church."

zackangelo

24 days ago

I’ve been writing all of our transformer implementations in Rust using the Candle crate and it’s been great.

While dealing with CUDA and GPUs on servers is never a joy, deploying fully contained Rust binaries instead of a morass of python scripts has improved the situation for me significantly.

Getting Samurai running on Candle shouldn’t be that large of an undertaking. I believe there’s already a SAM implementation.

GordonS

a month ago

Full, unabridged title (which adds something important!):

"SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"

It's the memory part that I find so impressive in the demo videos!

alberth

24 days ago

Seems great for tracking POI on CCTV.