hackernews client

TinyTroupe, a new LLM-powered multiagent persona simulation Python library

143 pointsposted a year ago

54 Comments

dragonwriter

a year ago

This seems fundamentally unsuitable for its stated purpose, which is “understanding human behavior”.

While it may, as it says, produce “convincing interactions”, there is no basis at all peesented for believing it produces an accurate model of human behavior, so using it to “understand human behavior” is at best willful self-deception, and probably, with a little effort at tweaking inputs to produce the desired results, most often when used by someone who presents it as “enlightening productivity and business scenarios” it will be an engine for simply manufacturing support for a pre-selected option.

It is certainly easier and cheaper than exploring actual human interactions to understand human behavior, but then so is just using a magic 8-ball, which may be less convincing, but for all the evidence supporting this is just as accurate.

keeda

a year ago

The source does not mention the underlying motivation (and it really should), but I think this is it:

https://www.linkedin.com/posts/emollick_kind-of-a-big-deal-a...

"... a new paper shows GPT-4 simulates people well enough to replicate social science experiments with high accuracy.

Note this is done by having the AI prompted to respond to survey questions as a person given random demographic characteristics & surveying thousands of "AI people," and works for studies published after the knowledge cut-off of the AI models."

A couple other posts along similar lines:

https://www.linkedin.com/posts/emollick_this-paper-suggests-...

"... LLMs automatically generate scientific hypotheses, and then test those hypotheses with simulated AI human agents.

https://www.linkedin.com/posts/emollick_formula-for-neat-ai-...

"Applying Asch's conformity experiment to LLMs: they tend to conform with the majority opinion, especially when they are "uncertain." Having a devil's advocate mitigates this effect, just as it does with people."

potatoman22

a year ago

I wonder how one could measure the how human-like the agents' opinions and interactions are? There's a ton of value in simulating preferences, but you're right that it's hard to know if the simulation is accurate.

I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.

bpshaver

a year ago

Section 6, "Controlled Evaluation," answers that question: https://arxiv.org/pdf/2304.03442

kaibee

a year ago

cw: i don't actually work in ML, i just read a lot. if someone who is a real expert can tell me if my assessment here is correct, please let me know.

> I have a hunch that, through sampling many AI "opinions," you can arrive at something like the wisdom of the crowd, but again, it's hard to validate.

That's what an AI model already is.

Let's say you had 10 temperature sensors on a mountain and you logged their data at time T.

If you take the average of those 10 readings, you get a 'wisdom of the crowds' from the temperature sensors, which you can model as an avg + std of your 10 real measurements.

You can then sample 10 new points from the normal distribution defined by that avg + std. Cool for generating new similar data, but it doesn't really tell you anything you didn't already know.

Trying to get 'wisdom of crowds' through repeated querying of the AI model is equivalent to sampling 10 new points at random from your distribution. You'll get values that are like your original distribution of true values (w/ some outliers) but there's probably a better way to get at what you're looking to extract from the model.

ori_b

a year ago

It's worse than that. LLMs have been tuned carefully to mostly produce output that will be inoffensive in a corporate environment. This isn't an unbiased sampling.

jfactorial

a year ago

True for consumer products like ChatGPT but there are plenty of models that are not censored. https://huggingface.co/models?sort=trending&search=uncensore...

isaacremuant

a year ago

No. The censoring has already been done systematically by tech corporations at the behest of political agents that have power over them.

You only have to look at opinions about covid policies to realize you won't get a good representation because opinions will be deemed "misinformation" by the powers that are vested in that being the case. Increasingly, criticism of government policy can be conflated with some sort of crime that is absolutely up for interpretation to some government institution so people self censor, companies censor just in casa and the Overton window gets narrower.

LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.

jfactorial

a year ago

> LLMs are awesome but they will only represent what they're trained on and what they're trained on only represents what's allowed to be in the mainstream discourse.

I don't think this is a description of LLM censorship though, especially in light of the fact that many LLMs are fine-tuned for the explicit purpose of censoring responses otherwise generatable by the model, Contrasting uncensored models with censored ones yields objectively uncensored results.

yarp

a year ago

Could be interesting if used with many different llms at once

user

a year ago

[deleted]

bsenftner

a year ago

My first thought while reading was this would be a great academic framework in the hands of PhD students with extremely high attention to all the details and those detail interactions. But in the hands of any group or any individual with a less scientifically rigorous mindset, it's a construction set for justifications to do practically anything. It's becomes in the hands of biased laypersons a toolset for using statistics to lie exponentially changed into a nuclear weapon.

michaelmior

a year ago

I would tend to agree. Although for something like testing ads it seems like it would be relatively straightforward to produce an A/B test that compares the performance of two ads relative t TinyTroupe's predictions.

A4ET8a8uTh0

a year ago

I did not test this library so I can't argue from that perspective ( I think I will though ; it does seem interesting ).

<< “enlightening productivity and business scenarios” it will be an engine for simply manufacturing support for a pre-selected option.

In a sense, this is what training employees is all about. You want to get them ready for various possible scenarios. For recurring tasks that do require some human input, it does not seem that far fetched.

<< produce “convincing interactions"

This is the interesting part. Is convincing a bad thing if it does what user would be expected to see?

mindcrime

a year ago

> it will be an engine for simply manufacturing support for a pre-selected option.

There's nothing unique about this tool in that regard though. Pretty much anything can be mis-used in that way - spreadsheets, graphics/visualizations, statistical models, etc. etc. Whether tools are actually used to support better decision making, or simply to support pre-selected decisions, is more about the culture of the organization and the mind-set of its leaders.

dragonwriter

a year ago

> There's nothing unique about this tool in that regard though.

Sure, it’s just part of an arms race where having a new thing with a hot selling pitch to cover that up and put a layer of buzzwords on top of it helps sell the results to audiences who have started to see through the existing ways of doing that.

mindcrime

a year ago

I agree in general. I'm just not sure how much the "new thing with a hot selling pitch" part even matters. At least IME, at companies where the culture is such that management just look for ways to add a sheen of scientific respectability to their ad-hoc decisions, nobody really questions the details. Management just put the "thing" out there, hand-wave some "blah, blah" around, everybody nods their heads, and things proceed as they were always going to.

A4ET8a8uTh0

a year ago

Agreed. At the end of the day, it is just another tool.

I think the issue is the human tendency to just rubber stamp whatever result is given. Not that long ago, few questioned the result of a study and now there won't even be underlying data to go back to see if someone made an error. Naturally, this would suggest that we will start seeing a lot of bad decisions, because human operators did not stop and think whether the response made sense.

That said, I am not sure what can be done about it.

simonw

a year ago

It looks like this defaults to GPT-4o: https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...

If you're going to try this out I would strongly recommend running it against GPT-4o mini instead. Mini is 16x cheaper and I'm confident the results you'll get out of it won't be 1/16th as good for this kind of experiment.

ttul

a year ago

I suppose the Microsoft researchers default to 4o because the models are free in their environment…

simonw

a year ago

Here's a quick way to start this running if you're using uv:

    cd /tmp
    git clone https://github.com/microsoft/tinytroupe
    cd tinytroupe
    OPENAI_API_KEY='your-key-here' uv run jupyter notebook

I used this pattern because my OpenAI key is stashed in a LLM-managed JSON file:

    OPENAI_API_KEY="$(jq -r '.openai' "$(dirname "$(llm logs path)")/keys.json")" \
      uv run jupyter notebook

(Which inspired me to add a new LLM feature: llm keys get openai - https://github.com/simonw/llm/issues/623)

xrd

a year ago

I love jupyter notebooks. And, I'm amazed that a company like Microsoft would put a notebook front and center that starts off with a bunch of errors. Not a good look. I really think you can improve your AI marketing by a lot by creating compelling jupyter notebooks. Unsloth is a great example of the right way.

https://github.com/microsoft/TinyTroupe/blob/main/examples/a...

highcountess

a year ago

I am glad I was not the only one that was taken aback a bit by that. I am not one to be too critical about loose ends or roughness in things that are provided for free and my ability to contribute a change, but it is a bit surprising that Microsoft would now have QA on this considering it ties into the current image they are trying to build.

itishappy

a year ago

Here's the punchline from the Product Brainstorming example, imagining new AI-driven features to add to Microsoft Word:

> AI-driven context-aware assistant. Suggests writing styles or tones based on the document's purpose and user's past preferences, adapting to industry-specific jargon.

> Smart template system. Learns from user's editing patterns to offer real-time suggestions for document structure and content.

> Automatic formatting and structuring for documents. Learns from previous documents to suggest efficient layouts and ensure compliance with standards like architectural specifications.

> Medical checker AI. Ensures compliance with healthcare regulations and checks for medical accuracy, such as verifying drug dosages and interactions.

> AI for building codes and compliance checks. Flags potential issues and ensures document accuracy and confidentiality, particularly useful for architects.

> Design checker AI for sustainable architecture. Includes a database of materials for sustainable and cost-effective architecture choices.

Right, so what's missing in Word is an AI generated medical compliance check that tracks drug interactions for you and an AI architectural compliance and confidentiality... thing. Of course these are all followed by a note that says "drawbacks: None." Also, the penultimate line generated 7 examples but cut the output off at 6.

The intermediate output isn't much better, generally restating the same thing over and over and appending "in medicine" or "in architecture." They quickly drop any context this discussion relates to word processors in favor of discussing how a generic industrial AI could help them. (Drug interactions in Word, my word.)

Worth noting this is a Microsoft product generating ideas for a different Microsoft product. I hope they vetted this within their org.

As a proof of concept, this looks interesting! As a potentially useful business insight tool this seems far out. I suppose this might explain some of Microsoft's recent product decisions...

https://github.com/microsoft/TinyTroupe/blob/main/examples/p...

potatoman22

a year ago

That example is funny because 99% of doctors would not use Word to write their notes (and not because it doesn't have this hot new AI feature).

oulipo

a year ago

So fucking sad that the use of AI is for... manipulating more humans into clicking on ads

Go get a fucking life, do something for the climate and repairing our societies social fabric instead

uniqueuid

a year ago

Needs openai or azure APIs. I wonder if it's possible to just use any openapi-compatible local provider.

uniqueuid

a year ago

Yup, looks like their azure api configuration is just a generic wrapper for openapi in which you can plug any endpoint url. Nice.

https://github.com/microsoft/TinyTroupe/blob/7ae16568ad1c4de...

user

a year ago

[deleted]

thegabriele

a year ago

I envision a future where ads are llms targetized. Is this worst or better than what we have now?

czbond

a year ago

This is really cool. I can see quite a number of potential applications.

user

a year ago

[deleted]

libertine

a year ago

Could this be applied to mass propaganda and disinformation campaigns on social networks?

Like not only generating and testing narratives but then even use it for agents to generate engagement.

We've seen massive bot networks unchecked on X to help tilt election results, so probably this could be deployed there too.

Jimmc414

a year ago

> We've seen massive bot networks unchecked on X to help tilt election results, so probably this could be deployed there too.

Do you have more details on this?

libertine

a year ago

There are so many instances over the past year but some examples:

[0]https://www.cyber.gc.ca/en/news-events/russian-state-sponsor...

[1]https://www.justice.gov/opa/pr/justice-department-leads-effo...

isaacremuant

a year ago

Let me guess. These bot networks that influence elections are from your political adversaries. Never from the party you support or your government when they're in power.

The election results tilting talk is tired and hypocritical.

libertine

a year ago

So this is all delusions:

> The Federal Bureau of Investigation (FBI) and Cyber National Mission Force (CNMF), in partnership with the Canadian Centre for Cyber Security (CCCS), the Netherlands General Intelligence and Security Service (AIVD), and Netherlands Military Intelligence and Security Service (MIVD), and the National Police of the Netherlands (DNP) (hereinafter referred to as the authoring organizations), seek to warn of a covert tool used for disinformation campaigns benefiting the Russian Government. Affiliates of RT (formerly Russia Today), a Russian state-sponsored media organization, used this tool to create fictitious online personas, representing a number of nationalities, to post content on a social media platform.[0]

Nothing is happening?

Maybe the true question is, why are you going out of your way to make an effort to dismiss and normalize election interference and the usage of AI and bot farms to spread misinformation?

[0]https://www.cyber.gc.ca/en/news-events/russian-state-sponsor...

GlomarGadaffi

a year ago

Actually Sburb.

thoreaux

a year ago

How do I get the schizophrenic version of this

GlomarGadaffi

a year ago

Following

A4ET8a8uTh0

a year ago

Hmm, now lets see if there is an effort anywhere to link it to a local llm.

ajcp

a year ago

Just provide it the localhost:port for your instance of LM Studio/text-generation-webui as the Azure OpenAI endpoint in the config. Should work fine, but going to confirm now.

EDIT: Okay, this repo is a mess. They have "OpenAI" hardcoded in so many places that it literally makes this useless for working with Azure OpenAI Service OR any other openai style API. That wouldn't be terrible once you fiddled with the config IF they weren't importing the config BEFORE they set default values...

A4ET8a8uTh0

a year ago

Yeah, I was gonna say, I ran into all sorts of issues, but I couldn't immediately tell if it is my weird config or something. I don't want to give up on it, because the idea is interesting, but I will admit I only have so much time to spend on this today.

user

a year ago

[deleted]

user

a year ago

[deleted]

cen4

a year ago

Might help GRRM finish his books.

user

a year ago

[deleted]