ch0ic3
a year ago
I screenshotted the question and pasted it in Claude 3.5 Sonnet. It answered:
To determine the output of the given function for an input of 3, let's analyze the code step by step:
The function is called fizzbuzz(n) and we're considering n = 3. The return statement uses a conditional expression: return (["Fizz","Buzz","FizzBuzz"])[ (n % 3 === 0) + (n % 5 === 0) ]; Let's break down the conditions:
(n % 3 === 0) evaluates to true (1) because 3 is divisible by 3 (n % 5 === 0) evaluates to false (0) because 3 is not divisible by 5
Adding these boolean values: 1 + 0 = 1 This result (1) is used as an index to select from the array ["Fizz", "Buzz", "FizzBuzz"] Index 1 in this array corresponds to "Buzz"
Therefore, for an input of 3, the function will return "Buzz". This might seem counterintuitive since typically in a FizzBuzz problem, numbers divisible by 3 return "Fizz". However, the indexing in this implementation leads to this unexpected result. Would you like me to explain any part of this analysis further?
rep_lodsb
a year ago
That's kind of impressive, being able to reason through it like this. However, I think that its normal "thought process" - and resulting limitations - is well illustrated by the remark about "the typical FizzBuzz problem":
It recognizes patterns that already exist in training data. Seeing "Fizz", "Buzz", and the modulo operator would naturally make it jump to a wrong conclusion, unless prompted to go through it step by step. Most humans wouldn't make this mistake, but then they probably also haven't memorized the "canonical" FizzBuzz example, so who can say if they are smarter or just forced to reason more carefully ;)
But even the step-by-step process depends on having seen enough similar examples before, and would likely be prone to confusion by more unusual code that only sort-of-matches a common pattern, perhaps with misleading comments. I wonder if any current LLM can correctly explain what this does:
; Example 1 - Print "Hello world" on Linux, x86-64
; Assemble : nasm -felf64 hello.asm
; Link : ld -o hello hello.o
section .data
Foo: db "The quick brown fox jumps over the lazy dog",10
Bar: db 17,42,27,36,3,41,20,39,13,6,44,0
section .text
default rel
global _start
_start:
lea rdi,Bar ;load pointer to buffer
L1: movzx ecx,byte [rdi] ;get input
jrcxz L2 ;exit on null byte
lea rsi,Foo ;load pointer to string
rep lodsb ;convert to ASCII
stosb ;store output
jmp L1
L2: mov eax,1 ;write
lea rsi,Bar ;buffer
mov rdx,rdi ;length = rdi - buffer
sub rdx,rsi
mov edi,1 ;handle = stdout
syscall
mov eax,60 ;exit
xor edi,edi ;status
syscall
(stnemmoc eht ni detats egassem eht ni tluser ton seod hcihw ,"ooF" ni noitisop retcarahc gnidnopserroc eht ot "raB" ni etyb hcae setalsnart ti taht si rewsna ehT)lolsowrong
a year ago
rep_lodsb
a year ago
I admit defeat and welcome our AI overlords :)
lolsowrong
a year ago
I was quite impressed as well.
dTal
a year ago
It's slightly less (or more?) impressive when you see that it thought for 45 seconds (!), and basically hand-executed the code to see what it did. I'd love to know how many tokens that actually took. Worth remembering that LLMs can be viewed as term rewrite engines, and as such can compute anything given enough space.
Twey
a year ago
My favourite litmus test for ‘can LLMs reason about code?’ is to make up a programming language with familiar syntax but weird semantics. E.G.:
- all variables contain signed integers
- all variable names have block scope
- there is no variable declaration syntax: all variables are implicitly initialized at first use with the value 5
- all integer literals are expressions
- the expression `a + b` means to subtract the value on the left from the variable on the right, returning the previous value of the variable
- a program is a block
- a block is a sequence of statements enclosed in braces and separated by semicolons, and executed from bottom to top
- conditionals are introduced by the keyword `while`, followed by an expression, followed by a block that is executed only if the expression evaluates to 4
- loops are done by simply prefixing a block with an expression; if the expression evaluates to 0, the block will run indefinitely, otherwise the block will run a number of times indicated by the negation of the value
Et cetera. Then I ask the LLM to write a simple program (e.g. FizzBuzz). Even with a lot of hand-holding, I've yet to get an LLM to do this successfully, or even to answer questions about a program written in the language.
gaganyaan
a year ago
I actually had pretty good results from taking a new language that was posted here and having GPT-4 try to interpret it. I don't remember what it was called, but it was APL-like, very symbol-dense, but not using standard symbols. It was too new to be included in any training data at the time, but GPT-4 did a good job of figuring out what each symbol meant.
I think it's not impossible for LLMs to write code like you're wanting. Maybe it's actually harder to redefine common idioms, but to be fair that happens with people too:
throwuxiytayq
a year ago
My favorite test is to ask the LLM to approximate the mental processes going on in my brain, and based on that, divinate what food I had for dinner last thursday. /s
I’m honestly quite tired of reading people’s favorite ways to break the LLM, like it’s some kind of an achievement. Always in the context of “See? It doesn’t really reason/know/understand X!”.
Yes, it breaks when asked to do complicated stuff. GPT4 was worse at it than o1, GPT3 broke on trivial queries, and GPT2 couldn’t do anything done. I don’t even interact with LLMs often, and I find this whole topic to be breathlessly obvious, boring and unproductive, and yet every single conversation about LLMs devolves into it. Sorry about the rant, but it needed to come out at some point.