hackernews client

The State of Machine Learning Frameworks in 2019

40 pointsposted 4 months ago

19 Comments

jph00

3 months ago

We knew in 2017 that PyTorch was the future, so moved all our research and teaching to it: https://www.fast.ai/posts/2017-09-08-introducing-pytorch-for... .

Scene_Cast2

3 months ago

I found out that in the embedded world (think microcontrollers without an MMU), Tensorflow lite is still the only game in town (pragmatically speaking) for vendor-supported hardware acceleration.

I recently tried to port my model to JAX. Got it all working the "JAX WAY", and I believe I did everything correct, with one neat top level .jit() applied to the training step. Unfortunately I could not replicate the performance boost of torch.compile(). I have not yet delved under the hood to find the culprit, but my model is fairly simple so I was sort of expecting JAX JIT to perform just as well if not better than torch.compile().

Have anyone else had similiar experiences?

yberreby

3 months ago

JAX code usually ends up being way faster than equivalent torch code for me, even with torch.compile. There are common performance killers, though. Notably, using Python control flow (if statements, loops) instead of jax.lax primitives (where, cond, scan, etc).

leviliebvin

3 months ago

Interesting. Thanks for you input. I already tried to adhere to the JAX paradigm as laid out in the documentation so I already have a fully static graph.

pama

3 months ago

I would test how much of the total flop capability of the hardware you are using. Take the first order terms of your model and estimate how many flops you need per data point (a good guide is 6*param for training if you mostly have large multiplies and nonlinearity/norm layers) and then calculate the real time performance for a given data size input vs the actual expected theoretical max perfomance for the given GPU (eg 1e15 FLOPs/s for bfloat16 per H100 or H200 GPU). If you are already over 50% it is unlikely you can have big gains without very considerable effort, and most likely simple jax or pytorch are not sufficient at that point. If you are at the 2–20% range there are probably some low hanging fruit left and the closer you are to using only 1% the easier it is to see dramatic gains.

CaptainOfCoit

3 months ago

> In 2019, the war for ML frameworks has two remaining main contenders: PyTorch and TensorFlow. My analysis suggests that researchers are abandoning TensorFlow and flocking to PyTorch in droves.

Seems they were pretty spot on! https://trends.google.com/trends/explore?date=all&q=pytorch,...

But to be fair, it was kind of obvious around ~2023 without having to look at metrics/data, you just had to look at what the researchers publishing novel research used.

Any similar articles that are a bit more up to date, maybe even for 2025?

jonas21

3 months ago

I feel like it was all pretty obvious by late 2017. Prototyping and development in PyTorch was so much easier - it felt just like writing normal Python code. And the supposed performance benefits of the static computation graph in TensorFlow didn't materialize for most workloads. Nobody wanted to use TensorFlow - though you often had to when working on existing codebases.

I think the only thing that could have saved TensorFlow at that point would have been some sort of enormous performance boost that would only work with their computation model. I'm assuming Google's plan was make it easy to run the same TensorFlow code on GPUs and TPUs, and then swoop in with TPUs that massively outperformed GPUs (at least on a performance per dollar basis). But that never really happened.

Legend2440

3 months ago

It’s still all pytorch.

Unless you’re working at Google, then maybe you use JAX.

mattnewton

3 months ago

JAX is quite popular in many labs outside of Google doing large scale training runs, because up until recently the parallelism ergonomics were way better. PyTorch core is catching up (maybe already witn the latest release, haven’t used it yet) and there are a lot of PyTorch using projects to study though.

fleahunter

3 months ago

[flagged]

bonoboTP

3 months ago

TensorFlow was an overengineered Google-style mess and they constantly made breaking changes.

All the graph building and session running was way too complex, with too much global state and variable sharing was complicated and based on naming and variable scopes and name scopes and so on.

It was an okay try, but that design simply didn't work so well for quick prototyping, iterating, debugging that's crucial in research.

PyTorch was much closer to just writing straightforward numpy code. TensorFlow 2 then tried to catch up with "eager mode", but in the background it was still a graph and tracing often broke and you had to write the code very carefully and with limitations.

In the end, Pytorch also developed proper production and serving tools as well as graph compilation, so now there's basically no reason to go to TensorFlow. Not even Google researchers use it (they use jax). I guess some industries still use it but at some point I expect Google to shut down TF and focus on the JAX ecosystem with some kind of conversion tools for TF.

CaptainOfCoit

3 months ago

> But then again, TensorFlow's got its enterprise backing, and I can't help but think about the implications of that. How long can PyTorch ride this wave before it runs into pressure from industry demands?

PyTorch has a huge collection of companies, organizations and other entities backing it, it's not gonna suddenly disappear soon, that much is clear. Take a look at https://pytorch.org/foundation/ for a sample

kenjackson

3 months ago

The thing about Tensorflow in 2017 is that everyone acknowledged how difficult it was to use. While it was almost the only game in town, no one was happy. Those are probably the areas where an upstart can come in and disrupt.

oceansky

3 months ago

In 2019 I delivered a instance segmentation project and I used Mask RCNN and tensorflow.

Nowadays it looks like yolo absolutely dominates this segment. Any data scientists can chime in?

bonoboTP

3 months ago

SAM (Segment Anything Model) by Meta is a popular go-to choice for off the shelf segmentation.

But the exciting new research is moving beyond the narrow task of segmentation. It's not just about having new models that get better scores but building larger multimodal systems, broader task definitions etc.

deepsquirrelnet

3 months ago

I haven’t used RCNN, but trained a custom YOLOv5 model maybe 3-4 years ago and was very happy with the results.

I think people have continued to work on it. There’s no single lab or developer, it mostly appears that the metrics for comparison are usually focused on the speed/MAP plane.

One nice thing is that even with modest hardware, it’s low enough latency to process video in real time.

jszymborski

3 months ago

lil' self promo but I made a similar blog post in 2018.

I gave mxnet a bit of an outsized score in hindsight, but outside of that I think I got things mostly right.

https://source.coveo.com/2018/08/14/deep-learning-showdown/

AndrewKemendo

3 months ago

Tensorflow was a revelation when it came out and Jeff & Sanjay were heralded as gods

Just goes to show that even when you’ve got everything going for you, perfect team filled with nice people, infinite resources (TPUs anyone?), perfect marketing, your own people will split off and take over the market.

Second place seems to always win the market