dragonwriter
3 days ago
This seems fundamentally unsuitable for its stated purpose, which is “understanding human behavior”.
While it may, as it says, produce “convincing interactions”, there is no basis at all peesented for believing it produces an accurate model of human behavior, so using it to “understand human behavior” is at best willful self-deception, and probably, with a little effort at tweaking inputs to produce the desired results, most often when used by someone who presents it as “enlightening productivity and business scenarios” it will be an engine for simply manufacturing support for a pre-selected option.
It is certainly easier and cheaper than exploring actual human interactions to understand human behavior, but then so is just using a magic 8-ball, which may be less convincing, but for all the evidence supporting this is just as accurate.
keeda
3 days ago
The source does not mention the underlying motivation (and it really should), but I think this is it:
https://www.linkedin.com/posts/emollick_kind-of-a-big-deal-a...
"... a new paper shows GPT-4 simulates people well enough to replicate social science experiments with high accuracy.
Note this is done by having the AI prompted to respond to survey questions as a person given random demographic characteristics & surveying thousands of "AI people," and works for studies published after the knowledge cut-off of the AI models."
A couple other posts along similar lines:
https://www.linkedin.com/posts/emollick_this-paper-suggests-...
"... LLMs automatically generate scientific hypotheses, and then test those hypotheses with simulated AI human agents.
https://www.linkedin.com/posts/emollick_formula-for-neat-ai-...
"Applying Asch's conformity experiment to LLMs: they tend to conform with the majority opinion, especially when they are "uncertain." Having a devil's advocate mitigates this effect, just as it does with people."
potatoman22
3 days ago
I wonder how one could measure the how human-like the agents' opinions and interactions are? There's a ton of value in simulating preferences, but you're right that it's hard to know if the simulation is accurate.
I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.
bpshaver
3 days ago
Section 6, "Controlled Evaluation," answers that question: https://arxiv.org/pdf/2304.03442
kaibee
3 days ago
cw: i don't actually work in ML, i just read a lot. if someone who is a real expert can tell me if my assessment here is correct, please let me know.
> I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.
That's what an AI model already is.
Let's say you had 10 temperature sensors on a mountain and you logged their data at time T.
If you take the average of those 10 readings, you get a 'wisdom of the crowds' from the temperature sensors, which you can model as an avg + std of your 10 real measurements.
You can then sample 10 new points from the normal distribution defined by that avg + std. Cool for generating new similar data, but it doesn't really tell you anything you didn't already know.
Trying to get 'wisdom of crowds' through repeated querying of the AI model is equivalent to sampling 10 new points at random from your distribution. You'll get values that are like your original distribution of true values (w/ some outliers) but there's probably a better way to get at what you're looking to extract from the model.
ori_b
3 days ago
It's worse than that. LLMs have been tuned carefully to mostly produce output that will be inoffensive in a corporate environment. This isn't an unbiased sampling.
jfactorial
3 days ago
True for consumer products like ChatGPT but there are plenty of models that are not censored. https://huggingface.co/models?sort=trending&search=uncensore...
isaacremuant
3 days ago
No. The censoring has already been done systematically by tech corporations at the behest of political agents that have power over them.
You only have to look at opinions about covid policies to realize you won't get a good representation because opinions will be deemed "misinformation" by the powers that are vested in that being the case. Increasingly, criticism of government policy can be conflated with some sort of crime that is absolutely up for interpretation to some government institution so people self censor, companies censor just in casa and the Overton window gets narrower.
LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.
jfactorial
a day ago
> LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.
I don't think this is a description of LLM censorship though, especially in light of the fact that many LLMs are fine-tuned for the explicit purpose of censoring responses otherwise generatable by the model, Contrasting uncensored models with censored ones yields objectively uncensored results.
yarp
3 days ago
Could be interesting if used with many different llms at once
user
2 days ago
bsenftner
3 days ago
My first thought while reading was this would be a great academic framework in the hands of PhD students with extremely high attention to all the details and those detail interactions. But in the hands of any group or any individual with a less scientifically rigorous mindset, it's a construction set for justifications to do practically anything. It's becomes in the hands of biased laypersons a toolset for using statistics to lie exponentially changed into a nuclear weapon.
michaelmior
3 days ago
I would tend to agree. Although for something like testing ads it seems like it would be relatively straightforward to produce an A/B test that compares the performance of two ads relative t TinyTroupe's predictions.
A4ET8a8uTh0
3 days ago
I did not test this library so I can't argue from that perspective ( I think I will though ; it does seem interesting ).
<< “enlightening productivity and business scenarios” it will be an engine for simply manufacturing support for a pre-selected option.
In a sense, this is what training employees is all about. You want to get them ready for various possible scenarios. For recurring tasks that do require some human input, it does not seem that far fetched.
<< produce “convincing interactions"
This is the interesting part. Is convincing a bad thing if it does what user would be expected to see?
mindcrime
3 days ago
> it will be an engine for simply manufacturing support for a pre-selected option.
There's nothing unique about this tool in that regard though. Pretty much anything can be mis-used in that way - spreadsheets, graphics/visualizations, statistical models, etc. etc. Whether tools are actually used to support better decision making, or simply to support pre-selected decisions, is more about the culture of the organization and the mind-set of its leaders.
dragonwriter
3 days ago
> There's nothing unique about this tool in that regard though.
Sure, it’s just part of an arms race where having a new thing with a hot selling pitch to cover that up and put a layer of buzzwords on top of it helps sell the results to audiences who have started to see through the existing ways of doing that.
mindcrime
3 days ago
I agree in general. I'm just not sure how much the "new thing with a hot selling pitch" part even matters. At least IME, at companies where the culture is such that management just look for ways to add a sheen of scientific respectability to their ad-hoc decisions, nobody really questions the details. Management just put the "thing" out there, hand-wave some "blah, blah" around, everybody nods their heads, and things proceed as they were always going to.
A4ET8a8uTh0
3 days ago
Agreed. At the end of the day, it is just another tool.
I think the issue is the human tendency to just rubber stamp whatever result is given. Not that long ago, few questioned the result of a study and now there won't even be underlying data to go back to see if someone made an error. Naturally, this would suggest that we will start seeing a lot of bad decisions, because human operators did not stop and think whether the response made sense.
That said, I am not sure what can be done about it.