LLMs can combine cross-domain insights, but the insights they have — that I've seen them have in the models I've used — are around the level of a second year university student.
I would concur with what the abstract says: incredibly valuable (IMO the breadth of easily discoverable knowledge is a huge plus all by itself), but don't put them in charge.
The "second year university student" analogy is interesting, but might not fully capture what's unique about LLMs in strategic analysis. Unlike students, LLMs can simultaneously process and synthesize insights from thousands of historical conflicts, military doctrines, and real-time data points without human cognitive limitations or biases.
The paper actually makes a stronger case for using LLMs to enhance rather than replace human strategists - imagine a military commander with instant access to an aide that has deeply analyzed every military campaign in history and can spot relevant patterns. The question isn't about putting LLMs "in charge," but whether we're fully leveraging their unique capabilities for strategic innovation while maintaining human oversight.
> Unlike students, LLMs can simultaneously process and synthesize insights from thousands of historical conflicts, military doctrines, and real-time data points without human cognitive limitations or biases.
Yes, indeed. Unfortunately (/fortunately depending on who you ask) despite this the actual quality of the output is merely "ok" rather than "fantastic".
If you need an answer immediately on any topic where "second year university student" is good enough, these are amazing tools. I don't have that skill level in, say, Chinese, where I can't tell 你好 (hello) from 泥壕 (mud hole/trench)* but ChatGPT can at least manage mediocre jokes that Google Translate turns back into English:
问: 什么东西越洗越脏?
答: 水!
But! My experience with LLM translation is much the same as with LLM code generation or GenAI images: anyone with actual skill in whatever field you're asking for support with, can easily do better than the AI.
It's a fantastic help when you would otherwise have an intern, and that's a lot of things, but it's not the right tool for every job.
* I assume this is grammatically gibberish in Chinese, I'm relying on Google Translate here: https://translate.google.com/?sl=zh-TW&tl=en&text=泥%20壕%20%2...
But the aide won't have deeply analyzed every military campaign in history; it will only spout off answers from books about those campaigns. It will have little to no insight on how to apply principles and lessons learned from similar campaigns in the current problem. Wars are not won by lines on maps. They're not won by cool gear. They're won by psychologically beating down the enemy until they're ready to surrender or open peace negotiations. Can LLMs get in an enemy's head?
> Can LLMs get in an enemy's head?
That may be much easier for an LLM than all the other things you listed.
Read their socials, write a script that grabs the voices and faces of their loved ones from videos they've shared, synthesise a video call… And yes, they can write the scripts even if they don't have the power to clone voices and faces themselves.
I have no idea what's coming. But this is going to be a wild decade even if nothing new gets invented.
Creating chaos and confusion is great, but it's only part of what a military campaign needs. You have to be able to use all levers of government power to put the other government or the adversary organization in a point where they feel compelled to quit or negotiate.
Aye.
FWIW, I hope all those other things remain a long way off.
Whoever's doing war game planning needs to consider the possibility of AI that can do those other things, but I'm going to have to just hope.
Only if the enemy has provided a large corpus of writing and other data to submit to train the LLM on.
The person you are responding to seems to be promoting a concept that is frequently spouted here and other places, but to me lacking sufficient or any evidence - that AI models, particularly LLMs, are both capable of reasoning (or what we consider reasoning) around problems and generating novel insights that it hasn't been trained on.
> Unlike students, LLMs can simultaneously process and synthesize insights from thousands of historical
They can't. Anything multivariate LLMs gloss over and prioritize flow of words over hard facts. Which makes sense considering LLMs are language models, not thinking engines, but that doesn't make them useful for serious(above "second year") intelectual tasks.
They don't have any such unique capabilities, other than that they come free of charge.
Kinda. Yes they have flaws, absolutely they do.
But it's not a mere coincidence that history contains the substring "story" (nor that in German, both "history" and "story" are "Geschichte") — these are tales of the past, narratives constructed based on evidence (usually), but still narratives.
Language models may well be superhuman at teasing apart the biases that are woven into the minds writing the narratives… At least in principle, though unfortunately RLHF means they're also likely sycophantically adding whatever set of biases they estimate that the user has.
They're subhuman about debiasing or any analytical tasks because they lack reasoning engines that we all have. They pick the most emotionally loaded narrative and go with it.
They can't handle counter-intuitive but absolutely logical cases like how eggplants and potatoes belong to same biological family but not radishes, instead they'll hallucinate and start gaslighting the user. Which might be okay for "second-year" students, but only going to be a root cause of some deadly gotcha in strategic decision-making.
They're language models. It's in the name. They work like one.
> They can't handle counter-intuitive but absolutely logical cases like how eggplants and potatoes belong to same biological family but not radishes
"Can't" you say. "Does", I say: https://chatgpt.com/c/6735b10c-4c28-8011-ab2d-602b51b59a3e
Not that it matters, this isn't a demonstration of reasoning, it's a demonstration of knowledge.
A better test would be if it can be fooled by statistics that have political aspects, so I went with the recent Veritasium video on this, and at least with my custom instructions, it goes off and does actual maths by calling out to the python code interpreter, so that's not going to demonstrate anything by itself: https://chatgpt.com/share/6735b727-f168-8011-94f7-a5ef8d3610...
But this then taints the "how would ${group member} respond to this?"; if I convince it to not do real statistics and give me a purely word-based answer, you can see the same kinds of narratives that you see actual humans give when presented with this kind of info: https://chatgpt.com/share/6735b80f-ed50-8011-991f-bccf8e8b95...
> They're language models. It's in the name. They work like one.
Yes, they are.
Lojban is also a language.
Look, I'm not claiming they're fantastic at maths (at least when you stop them from using tools), but the biasing I'm talking about is part of language as it is used: the definition of "nurse" may not be gendered, but people are more likely to assume a nurse is a woman than a man, and that's absolutely a thing these models (and even their predecessors like Word2Vec) pick up on:
https://chanind.github.io/word2vec-gender-bias-explorer/#/qu...
(from: https://chanind.github.io/nlp/2021/06/10/word2vec-gender-bia...)
This is the kind of de-bias and re-bias I mean.
> "Can't" you say. "Does", I say:
Have you seriously not seen them make this kinds of grave mistakes? That's too much kool-aid you're taking.
I literally gave you a link to a ChatGPT session where it did what you said it can't do.
And rather than use that as a basis for claiming that it's reasoning, I'm also saying the test that you proposed and which I falsified, wasn't actually about reasoning.
Not sure what that would even be in a kook-aid themed metaphor in this case… "You said that drink was poisoned with something that would make our heads explode, Dave drank some and he's fine, but also poison doesn't do that and if the real poison is α-Amanitin we wouldn't even notice problems for at about a day"?
A language model isn't a model of strategic conflict or reasoning, but may contain text in its training data related to these concepts. I'm unclear why (and it seems the paper agrees) you would use the llm to reason when there are better models for reasoning about the problem domain - and the main value from llm is ability to consume unstructured data to populate the other models.
You are using a different definition of strategic than the DoD uses, what you are describing is closer to tactical decisions.
They are talking about typically Org wide scope, long-term direction .
They aren't talking about planning hidden as 'strategic planning' in the biz world.
LLMs are powerful, but are by definition past focused, and are still in-context learners.
As they covered, hallucinations, adverse actions, unexplainable models, etc are problematic.
The "novel strategic approaches" is what in this domain would be tactics, not stratagy which is focused on the unknowable or unknown knowable.
They are talking about issues way past methods like circumscription and the ability to determine if a problem can be answered as true or false in a reasonable amount of time.
Here is a recent primer on the complexity of circumscription as it is a bit of a obscure concept.
https://www.arxiv.org/abs/2407.20822
Remember, finding an effective choice function is hard no matter what your problem domain is for non trivial issues, setting a durable shared direction to communicate in the presence of the unknowable future that can't be gamed or predictable by an advisory is even more so.
Researching what mission command is may help understand the nuances that are lost with overloaded terms.
Strategy being distinct from stratagem is also an important distinction in this domain.
> but are by definition past focused,
To add to that, and because the GP had mentioned (a "virtual") Clausewitz, "human"/irl strategy itself has in many cases been too focused on said past and, because of that, has caused defeats for the adopters of those "past-focused" strategies. Look at the Clausewitzian concept of "decisive victory" which was adopted by German WW1 strategists who, in so doing, ended up causing defeat for their country.
Good strategy is an art, the same as war, no LLM nor any other computer code would be ever able to replicate it or improve on it.