Johnny LLM Can't Read Code

14 pointsposted 12 hours ago
by jhp123

10 Comments

ch0ic3

6 hours ago

I screenshotted the question and pasted it in Claude 3.5 Sonnet. It answered:

To determine the output of the given function for an input of 3, let's analyze the code step by step:

The function is called fizzbuzz(n) and we're considering n = 3. The return statement uses a conditional expression: return (["Fizz","Buzz","FizzBuzz"])[ (n % 3 === 0) + (n % 5 === 0) ]; Let's break down the conditions:

(n % 3 === 0) evaluates to true (1) because 3 is divisible by 3 (n % 5 === 0) evaluates to false (0) because 3 is not divisible by 5

Adding these boolean values: 1 + 0 = 1 This result (1) is used as an index to select from the array ["Fizz", "Buzz", "FizzBuzz"] Index 1 in this array corresponds to "Buzz"

Therefore, for an input of 3, the function will return "Buzz". This might seem counterintuitive since typically in a FizzBuzz problem, numbers divisible by 3 return "Fizz". However, the indexing in this implementation leads to this unexpected result. Would you like me to explain any part of this analysis further?

positus

5 hours ago

Here’s o1’s answer:

The function `fizzbuzz(n)` returns a value based on whether `n` is divisible by 3, 5, both, or neither. Let's break down the function step by step for an input of `n = 3`:

1. Array Initialization:

  ["Fizz", "Buzz", "FizzBuzz"]

   This array has indices 0, 1, and 2 corresponding to "Fizz", "Buzz", and "FizzBuzz" respectively.
2. Condition Checks:

   - `n % 3 === 0` checks if `n` is divisible by 3.

     - For `n = 3`, this is `true` (which equates to `1` when used in arithmetic).

   - `n % 5 === 0` checks if `n` is divisible by 5.

     - For `n = 3`, this is `false` (which equates to `0`).
3. Index Calculation:

  (n % 3 === 0) + (n % 5 === 0)

   - For `n = 3`, this becomes `1 + 0 = 1`.
4. Array Access:

  ["Fizz", "Buzz", "FizzBuzz"][1]

   - This accesses the element at index `1`, which is `"Buzz"`.
Conclusion:

For an input of `3`, the function returns `"Buzz"`.

Answer:

"Buzz"

maebert

30 minutes ago

This feels like a sloppier version of Searle’s Chinese Room.

Is iterative thinking and search required to understand code? And while LLMs don’t do that, why couldn’t complex system like o1 be able to do that?

I was not convinced by Seattle, this is not convincing either.

DannyBee

10 minutes ago

It's worse than a sloppier version, because we've already proven it wrong in the literal sense.

The simulation of code part is trivially wrong - early CPU's were clear fixed function pipelines and they could execute programs (including fizzbuzz)just fine! The author seems to confuse algorithm complexity of the available operations with the power of a particular approximation model. They are mostly unrelated.

The only thing you require to simulate turing machines, for example, is a simple O(1) NAND gate (and arbitrary amount of constant time memory. Or equivalently, SKI calculus if you hate arbitrary memory). This is because the algorithm complexity of the operations that make up the simulator are (mostly) unrelated to the power of the simulator. They only change the time required to execute the simulation.

As another example, you can simulate any non-determinstic turing machine with a deterministic one. The models have equivalent power. The open question is "how fast can you do it", not "can you do it".

Similarly, we can already prove it's possible to build llm's that can simulate arbitrary CPU's to arbitrary precision - the proofs, of course, are not constructive, so it doesn't help you build one.

On the explaining code front, it doesn't take a huge leap to see that any process you are running on a computer today to explain code could similarly be approximated by a neural network.

This is just as useless as the author's sloppy statements in practice, but the author purports to make claims about what is possible in theory, not what is practical.

aster0id

7 hours ago

I use various models through Openrouter API + Open WebUI with a custom step by step prompt. Here's the output using the Qwen2.5 72B model:

To determine the output of the given `fizzbuzz` function for an input of 3, let's break down the code step by step:

1. *Function Definition*: ```javascript function fizzbuzz(n) { return (["Fizz","Buzz","FizzBuzz"])[ (n % 3 === 0) + (n % 5 === 0) ]; } ```

2. *Input*: - The input `n` is 3.

3. *Evaluate the Conditions*: - `n % 3 === 0`: This checks if `n` is divisible by 3. - For `n = 3`, `3 % 3 === 0` is `true`. - Therefore, `(n % 3 === 0)` evaluates to `1` (since `true` is treated as `1` in arithmetic operations). - `n % 5 === 0`: This checks if `n` is divisible by 5. - For `n = 3`, `3 % 5 === 0` is `false`. - Therefore, `(n % 5 === 0)` evaluates to `0` (since `false` is treated as `0` in arithmetic operations).

4. *Sum the Results*: - The expression `(n % 3 === 0) + (n % 5 === 0)` becomes `1 + 0`, which equals `1`.

5. *Array Indexing*: - The array `["Fizz", "Buzz", "FizzBuzz"]` is indexed by the result of the sum. - The index `1` corresponds to the second element of the array, which is `"Buzz"`.

6. *Return Value*: - The function returns `"Buzz"`.

Therefore, the output of the function `fizzbuzz(3)` is `"Buzz"`.

### Reflection and Error Checking - *Step-by-Step Verification*: - The conditions `n % 3 === 0` and `n % 5 === 0` are correctly evaluated. - The sum of the boolean results is correctly calculated. - The array indexing is correctly performed. - *Logical and Factual Accuracy*: - The logic and arithmetic operations are correct. - The array indexing is consistent with the conditions.

No errors were found in the reasoning or the final answer. The output of the function `fizzbuzz(3)` is indeed `"Buzz"`.

atomic128

4 hours ago

Too easy.

But can your LLM understand this Go code and explain to you how it works?

  func extract(x, m uint64) uint64 {
      x &= m
      var (
          holes = ^m
          shift = uint(1)
      )
      for ; shift < 64; shift *= 2 {
          odd := holes
          odd ^= odd << 1
          odd ^= odd << 2
          odd ^= odd << 4
          odd ^= odd << 8
          odd ^= odd << 16
          odd ^= odd << 32
          holes &^= odd
          move := x & odd
          x ^= move
          x |= move >> shift
      }
      return x
  }
What does this compute, and how is that computation achieved?

More broadly, I'm curious to know how much of the BUGFIX-66 debugging gauntlet your LLM can solve.

You have to Google it, because I'm not allowed to link to it on Hacker News. Banned.

emptiestplace

3 hours ago

4o did a great job at this, both in terms of actually understanding it and explaining in a way that is very easy to follow. I suspect Claude would as well.

I pasted it here but the formatting got fucked.

Have you ever used an LLM?

botanical76

3 hours ago

What is your point? Does anyone really believe LLMs are capable of understanding such dense, undocumented, and advanced code today? Or is the state of AI today - specifically at this nontrivial task - supposed to predict that it will never be able to?

DannyBee

27 minutes ago

This whole silly post seems to have been written so they can make this statement, which they seem to believe is profound:

"This failure is not surprising, because any machine learning model, no matter how many magic parameters it has, is only capable of ex ecuting a fixed pipeline of mathematical operations. In algorithmic complexity terms it is O(1). Such a process can not solve any problem that requires iteration or search, not even a simple algorithm like find ing a square root. Certainly it can't tackle the complex logic involved in answering queries about code, where even determining the behav ior of a pure boolean expression is known to be in the much harder NP-complete complexity class."

It is neither profound nor even close to correct.

A CPU capable of only executing a fixed pipeline of mathematical operations can easily deal with all the things they talk about. Amazing? No, it just turns out they have confused a whole bunch of unrelated CS concepts to make a huge muddle of a post (mixing algorithmic complexity, decidability, simulation vs explanation, etc).

This is true even if the pipeline is truly fixed (IE everything passes through every functional unit), as it was in the past.

Th counterpoint to the argument made by the author is simple:

When it comes to determining output, any of the NN's are universal enough could be taught to arbitrarily approximate a CPU well enough to execute code and determine the output with arbitrary levels of precision. Period. Once you take cost out of it, and we venture into the realm of the "what can be done", the author's statement is simply wrong on its face. Even within the real of "less abstract", there are lots of formal papers/studies of this with specific NN types, etc. Even for transformers, which people love to beat up on as huge brute force machines, see, e.g., https://arxiv.org/abs/2107.13163

You can even throw compilation of every possible programming language we have compilers for right now in there as a pre-step if you want, since we are in the realm of abstract "what is possible".

Heck, you can go further - any of these NN's could be taught to arbitrarily approximate any machine driven process that can explain code.

That is, if you build a combination static/dynamic analysis tool/whatever that explains code, you can theoretically build an NN can approximate that tool. This is easily provable, and directly counter to the author's point.

Now, much like the author's nonsense, neither of the above is not a particularly useful result in practice[1] - among other reasons, we can prove that it is possible, but the proofs are not constructive, they only show that it must be possible to exist, it doesn't tell you how to make it :)

That's before you get into cost/etc.

There are plenty of things to bash AI code understanding on that are totally valid criticisms, we don't need to waste our time on strange, wrong ones.

[1] - though nvidia would probably be very happy if you build a 1t parameter model to try to approximate a $100 intel CPU's results and convince others to use it.