hackernews client

What happens to SaaS in a world with computer-using agents?

90 pointsposted 5 months ago

(docs.google.com)

84 Comments

Magmalgebra

5 months ago

I think this post underestimates how the degree to which “what data is correct” is deeply contextual.

My team created an identical hypothesis to this doc ~2 years ago and generated a proof of concept. It was pretty magic, we had fortune 500 execs asking for reports on internal metrics and they’d generate in a couple of minutes. First week we got rave reviews - followed by an immediate round of negative feedback as we realized that ~90% of the reports were deeply wrong.

Why were they wrong? It had nothing to do with the LLMs per se, 03-mini doesn’t do much better on our suite than gpt 3.5. The problem was that knowing which data to use for which query was deeply contextual.

Digging into use cases you’d fine that for a particular question you needed to not just get all the rows from a column, you needed to do some obscure JOIN ON operation. This fact was only known by 2 data scientists in charge of writing the report. This flavor or problem - data being messy, with the messiness only documented in a few people’s brains, repeated over and over.

I still work on AI powered products and I don’t see even a little line of sight on this problem. Everyone’s data is immensely messy and likely to remain so. AI has introduced a number of tools to manage that mess, but so far it appears they’ll need to be exposed via fairly traditional UIs.

burnte

5 months ago

> I think this post underestimates how the degree to which “what data is correct” is deeply contextual.

I can't get anyone to listen to this point. I'm seeing plans going full steam ahead deploying AI when they don't even have a good definition of the PROBLEM much less how to train the AI to do things well and correctly. I was in a 90 minute meeting with some execs who were all high on ChatGPT Operators. He was saying we could replace 80 people at this company RIGHT NOW with this tool. I asked the presenter to type in one simple request to the AI, the entire demo went wildly off the rails from then on and the presenter wasn't even remotely bothered by that. People are either completely taken in by the marketing and believe like it's a religion, or they have solid, sensible concerns about reliability. But the number of people in category 2 is a smaller number than the true believers.

lfxyz

5 months ago

> People are either completely taken in by the marketing and believe like it's a religion, or they have solid, sensible concerns about reliability.

The other issue is that the first group are labelled as innovative go-getters, while the second group are labelled as negative crusty curmudgeons and this has an impact on the careers of both groups.

OccamsMirror

5 months ago

Executives are salivating at the opportunity to cut 80 staff with no repercussions. They're so eager they're keeping the blinkers on. The fact that the AI has no clothing is ignored because the promise is so attractive.

burnte

5 months ago

Yep. He said any company that isn't all-in on AI will be out of business next year. I really want to see ChatGPT Operators fix my roof.

satvikpendem

5 months ago

> I can't get anyone to listen to this point. I'm seeing plans going full steam ahead deploying AI when they don't even have a good definition of the PROBLEM much less how to train the AI to do things well and correctly.

First time? I did AI work years before the current generative AI boom and it was the same then too, managers wanted to stick AI into everything without even knowing what the hell they actually wanted in the end.

burnte

5 months ago

First time with AI, yeah, but I am not surprised since I've seen it with every other tech fad. People come up with solutions to implement, not problems to solve.

sansseriff

5 months ago

It will be interesting to see in what fields it's worth the effort to curate you're data to a high enough standard that you get all the benefits of the ai agent.

I'm currently working as a scientist. I wonder if researchers will be willing to annotate their papers, data, reasoning, and arguments well enough that ai agents can make good use if it all.

If you write your papers in an AI friendly way, maybe that means more citations? Does this mean switching to new publishing formats? Pdfs are certainly limiting

aeturnum

5 months ago

I think a lot of the power and capability of LLMs comes from their understanding of a lot of implicit context in language. But generally LLMs will have a dominant understanding of each linguistic construct and if that understanding is isn't correct they struggle.

We've looked at using agents at my current job but most of the time, once the data is properly structured, a more traditional approach is faster and less expensive.

TaurenHunter

5 months ago

That must be the reason why Palantir and other AI companies are using the concept of "ontology".

We can't let a LLM loose on a database and expect it to figure out everything.

joshstrange

5 months ago

Yep, I had a similar experience around a year or so ago. Hooking an LLM up to my RDMBS was really cool for the first 1-2 questions but fell over almost immediately with questions that strayed much further than “how many rows are in this table”.

Sure, you can do some basic filtering (but it would fail here making bad assumptions) and any (correct) joins were a crap-shoot. I was including schema and sample rows from all my tables, I wrote 10’s of lines of instructions explaining the logic of the tables and that still didn’t begin to cover all the cases.

Prompt engineering tons of business logic is a horrible job. It hard to test and it feels so “squishy” and unreliable. Even with all of my rules, it would write queries that didn’t work and/or broke a rule/concept that I had laid out.

In my experience, you’re much better off using AI to help you write some queries that you add to the codebase (after tweaking/checking) then you are having AI come up with queries at run time.

yibg

5 months ago

Completely agree. Even things that are considered "standard" or "basic" some times have deep contextual variances. For instance basic questions like "what is my ARR this month" can have varying answers for different businesses.

lukev

5 months ago

This is absolutely the problem. But there is a line of sight; namely, combining LLMs with existing semantic data technologies (e.g, RDF.)

This is why I'm building a federated query optimizer: we want to let the LLM reason and formulate queries at the ontological level, with query execution operating behind a layer of abstraction.

Magmalgebra

5 months ago

Unfortunately this doesn't address the problem I'm describing.

My team had these ontologies available to the LLM and provided it in the context window. The queries were ontologically sensible at a surface level, but still wrong.

The problem is that your ontology is rapidly changing in non-obvious and hard to document ways e.g. "this report is only valid if it was generated on a tuesday or thursday after 1pm because that's when the ETL runs, at any other time the data will be incorrect"

creaghpatr

5 months ago

The analysts know where the bodies are buried, so to speak. The execs may not even be aware there are bodies.

jaennaet

5 months ago

This got me curious as to what "queries at the ontological level" means in concrete terms. It's been a good long while since I did anything even remotely data engineering -like, and back then "AI" could be something like a support vector machine (yay moving goalposts), so I haven't had to deal with this sort of stuff at all.

abakker

5 months ago

Line of sight to a problem solving architecture, while cool, is nowhere near line of sight on upgrading the existing crappy data that is critically intertwined with literal thousands of apps in a typical enterprise.

SoftTalker

5 months ago

Did the execs immediately recognize that the reports were wrong, or did some analyst working in a cubicle on the 9th floor point that out?

Magmalgebra

5 months ago

Ususally the analyst, but sometimes the exec - hard to miss when a report implies your revenue has shifted 90%+ in either direction since the last time you read a report :)

llm_trw

5 months ago

The intern under the analyst did.

llm_trw

5 months ago

>I still work on AI powered products and I don’t see even a little line of sight on this problem. Everyone’s data is immensely messy and likely to remain so.

I've worked in the space as well and completely unstructured data is better than whatever you call a database with a dozen ad hoc tables each storing information somewhat differently to each other for reports written by a dozen different people over a decade.

I have a benchmark for an agentic system which measures how many joins between tables the system can do before it goes off the rails. But there is nothing off the shelf that does it and for whatever reason no one is talking about it in the open. But there are companies working to solve it in the background - since I've worked with three so far.

Without documentation giving some grounding about what the table is doing, you're left with hoping the database is self documenting enough for the agent to figure out what the column names mean and if joining on them makes sense - good luck doing it on id1, id2, idCustomerLocal, id_customer_foreign though.

Magmalgebra

5 months ago

Descriptions of tables is insufficient (we had it) - you also need descriptions of the systems writing to the tables.

My favorite example was a report that was only accurate if generated on a Tuesday or Thursday due to when the ETL pipeline ran. A small config change on the opposite side of a code base completely altered the semantics of the data!

llm_trw

5 months ago

If you're interested please drop an email. I've only worked deeply with pipelines extracting data from documents and I'd be interested in hearing what the challenges with databases are.

mooreds

5 months ago

> Digging into use cases you’d fine that for a particular question you needed to not just get all the rows from a column, you needed to do some obscure JOIN ON operation. This fact was only known by 2 data scientists in charge of writing the report. This flavor or problem - data being messy, with the messiness only documented in a few people’s brains, repeated over and over.

This reminds me of one of the key plot points in "The Sparrow" by Mary Doria Russell. Small spoiler ahead so if you haven't read it and want to be surprised, stop reading.

...

Basically, one of the characters works as an AI implementer, replacing humans in their jobs by learning deeply about how they do their work and coding up an AI replacement. She run across a SETI researcher and works on replacing him, but he has a human intuition when matching signals that she would never have discovered because it was so random.

Great book if you haven't read it.

bashtoni

5 months ago

This is a good read that is a great starting point for thinking about this. It essentially takes the extreme position - SaaS no longer needs a UI, because the LLM is the UI.

In reality, as always, I suspect the truth will be somewhere in between. SaaS products that succeed will be those that have a good UI _and_ and good API that LLMs can use.

An LLM is not always the best interface, particularly for data access. For most people, clicking a few times in the right places is preferable to having to type out (or even speak aloud) "Show me all the calls I did today", waiting for the result, having to follow up with "include the time per call and the expected deal value", etc etc.

There is undoubtedly an opportunity for disruption here, but I think an LLM only SaaS platform is going to be a very tough sell for at least the next decade.

sansseriff

5 months ago

Yep, it's funny how one of the key factors that limits LLM usage is just the typing speed of users.

I agree that the amount of bespoke UI that needs to exist probably won't stagnate. Humans need about the same amount of visual information to verify a task was done correctly as they need to do the task.

LLM generated UI is an interesting field. Sure, you can get ChatGPT to generate schema to lay out some buttons. But it seems harder to identify the context and relevant information that must be displayed for the human to be a valuable/necessary asset in the process.

thinkindie

5 months ago

I agree with you - also because most of the activities described in the post can be turned around where the SaaS wraps a LLM around specific tasks to augment data (e.g. call transcription, summarisation and preparation for the next meeting).

As an industry, we have been through a textual user interface already: terminals, and we moved away from that.

And voice UIs are not new either: we have had voice assistant for quite some time now, and they didn't see the success Apple, Google or Amazon were expecting (recently it came out that most of echo use cases were about setting timers).

bradchris

5 months ago

Also— for B2B SaaS, a big component of what is being sold is not the product, but support. No matter how modern or antiquated the tech is, many B2B companies don’t actually care about the experience per-se; they care about compliance, security, data integrity, and ongoing support. That’s essentially Oracle’s entire playbook!

How do LLM SaaS replacements solve that?

s__s

5 months ago

The position is more extreme than that. It’s your SaaS without its UI is nothing more than a database.

> The underlying SaaS platform is reduced to a “database” or “utility” that an agent can switch out if needed.

I agree that UI isn’t going away completely. Language is a slow and imprecise tool. A well developed UI can be much more efficient. I think it will be much more like the Star Trek universe, where we use a blend of the two.

In any case, if the AI agent can generate UI on the fly, it seems their point still stands?

GiorgioG

5 months ago

Oh yeah I can't wait for the AI to layout the UI in an arbitrary fashion, put buttons wherever the hell it feels like it (even in a place you can neither see nor click on.). Yes please, also I would like to automate the customer service for said product so it can be a complete black box of uselessness.

nmaley

5 months ago

Look, I love LLMs and even implement them for customers, but I am very sceptical about them 'replacing' ERP and CRP systems. What some AI folks don't seem to understand is that traditional ERP and CRP apps are completely driven by auditable business rules because they have to be. If you're running a company, there's no discretion at all about how money and other assets and liabilities are accounted for. It all has to be strictly according to the rules. This goes for most everything else - management are responsible for the business rules implemented in the system and they need to be precisely spelled out. Sure, AI can and should be used extensively for the human UI piece of it. To simplify getting data into and out of the system for example. But the engine inside and the database are all strictly rule governed and I definitely dont expect that to change anytime soon.

Bjorkbat

5 months ago

This kind of reminds me of when there was a lot of hype around messenger apps and this idea that we'd just do everything through a chat interface / chat bot.

It never panned out, arguably because the technology wasn't quite there yet (this was well before ChatGPT came out), but I thought the bigger problem was that people thought that a chat UI was the ultimate user interface. Just didn't feel right to me. For simple tasks, sure, but otherwise it felt like for "exploratory" tasks it made more sense to have a graphical user interface of some kind.

Same sentiments apply to the hype around agents. Even in a hypothetical world where agents work as well as any human I don't think an agent/chatbot UI is necessarily the ultimate user interface. If I'm asking an agent questions, it makes sense for it to show rather than tell in many contexts. Even in a world where agents capture much of the way we interact with computers, it might make more sense for them to show us using 3rd party SaaS apps.

reportgunner

5 months ago

It was the same then as it is now. Chatbot providers had bots to sell, now autocomplete providers have autocomplete to sell. Marketing people just say what they get paid to say.

bushido

5 months ago

It's an intriguing take, but as others have pointed out, the truth will be somewhere in the middle. I don't believe that AI will replace the entire SaaS interface. And I also don't think it will need as many services and APIs of yester-years.

This writeup seems to be authored by a senior designer at Salesforce and I can see the motivation from the their perspective. Their challenges are different than what a new SaaS product will encounter.

Like all the incumbents of their time they are a core-ish database that depended on a plethora of point solutions from vendors and partners to fill in the gaps their product left in constructing workflows. If they don't take an approach like being discussed here – or in the linked OpenAI/Softbank video – they will risk alienating their vendors/partners or worse see them becoming competitors in their own right.

Disclaimer – I'm biased too, I'm building one of the upstarts that aims to compete with Salesforce.

egypturnash

5 months ago

Have you ever watched people talk excitedly about "agents" for thirty or forty years without ever actually providing an example that functioned for more than a couple of very precisely staged demos, if that?

You Will.

dimitri-vs

5 months ago

Alexa/Siri/Google Assistant.

Except this time with full admin access to everything.

Terr_

5 months ago

And no auditable rules, just "whatever sounded like the next word to append to the interaction document."

GiorgioG

5 months ago

I think everyone that thinks this way are smoking something. I use the latest and greatest AI tools and they never fail to disappoint, make shit up and just waste hours of time because they would rather answer with nonsense than ask questions or just say I don’t fucking know or something isn’t possible.

vosper

5 months ago

I learned about the idea of Generative UI from a Sharp Talk podcast, and it's stuck with me ever since.

Many SaaS (especially the complex ones, which are the also the most important ones) have a tonne of UI often imposing a huge amount of non-work work onto users - all the clicking you have to do as part of entering or retrieving data, especially if the UI flow doesn't fit exactly what you're trying to do at that moment. An example might be quicly creating an epic and a bunch of related tickets in Jira, and having them all share some common components.

A generative UI would be able to construct a custom UI for the particular thing the user is trying to do at any point in time. I think it's a really powerful idea, and it could probably be done today by smartly using eg Jira's APIs.

The ability to span applications would be even more powerful. Done well it might even kill the need to maintain complex integrations between related Saas (eg how some product development application might need to sync data to/from Jira or ADO) by having the AI just keep track of changes and move them from one system to another.

Once it gets to the point where the Gen UI is go-to system for interactions you have to wonder what all the designers and UI builders at the myriad SaaS will be doing...

dimitri-vs

5 months ago

In a way that's what Claude Artifacts are. That said, I think there are many more ways to get gen UIs wrong then there are to get them right. Most users and use cases will be counterproductive with a dynamic UI. Debugging will be an absolute nightmare if not outright impossible, same with security.

Terr_

5 months ago

> Most users and use cases will be counterproductive with a dynamic UI.

Pro: The GUI dynamically adjusts.

Con: There's no consistent mental model for you to learn, when you need to use something it's not there, and the stuff which is there might not do what you expect.

ajcp

5 months ago

I was just commenting on something toward this end, but think I took it further than just UI to apply to the whole software: https://news.ycombinator.com/item?id=42562289

pragmatic

5 months ago

Using SaaS products even with an API is fraught with peril with actual engineers and QA (sometimes) on both sides.

Who's going to bet millions of dollars these agents after going to get it right. Based on what evidence?

nitwit005

5 months ago

Let's look at an actual CRM for a moment. Salesforce has an suite of sales forecasts for projecting sales. A major feature of that is letting people make "adjustments" to the data. Every layer of your sales org can tweak the numbers that the layer below generates: https://help.salesforce.com/s/articleView?id=sales.forecasts...

I'm sure some of those adjustments are reasonable, but I'm also sure this gets used to create a stack of lies to please upper management.

There's some obvious issues with some sort of AI in such an environment. Do you train the AI to tell the right sorts of lies?

TranquilMarmot

5 months ago

We're working on Agents over at Zapier, https://zapier.com/agents

You can have Agents run behaviors async by attaching triggers to them, for example when you get a specific email or something gets updated in a CRM. You can also give the agent access to basically any third-party action you can think of.

Like others in this thread have pointed out, there's a nice middle-ground here between an LLM-only interface and some nice UI around it, as well as ways to introduce determinism where it makes sense.

The product is still in its early days and we're iterating rapidly, but feel free to check it out and give us some feedback. There's a decent free plan.

randomcatuser

5 months ago

Nice! I'm kinda curious -- what do you see as Zapier's advantage when it comes to building agents? It seems like everyone is doing something similar? (e.g Lindy, Gumloop)

Agents have a pre-iPhone feel to them (when everyone was making phones with keyboards). What do you think the ultimate Agents looks like?

TranquilMarmot

5 months ago

As the other commenter pointed out, Zapier already has 7k+ apps that can be connected to. Zapier has already figured out the "hard parts" connecting to other apps, which are things like authentication, getting APIs to return standardized schemas, and most importantly business relationships with other companies so that if things need changing, we can reach out. We also have tested and proven infrastructure that can handle massive scale.

I definitely agree that agents are very early days - as for the "ultimate agent" it feels like everything is moving towards them being a sort of co-worker, if that makes sense. I think handling human-in-the-loop scenarios nicely is going to be vital to an agent actually being useful. i.e. you clock in in the morning and check out all the stuff your agent did overnight and can approve/change/reject tasks. There's a ton of healthy competition in the space, so in a year or two we'll all have a much better idea of where the tech is going.

OccamsMirror

5 months ago

I don't work for Zapier, but I think it's clear that their advantage is they already have the ability to work with different product's APIs at their fingertips. That's going to make creating a working agent that's actually productive a lot easier than for most.

I wouldn't call it a moat, but it's definitely a giant head start.

aeromusek

5 months ago

For this to become true, agents first have to transcend 'chatbot' as the primary interaction layer.

There's a reason we're still using apps instead of talking to Siri…for a huge number of tasks, visual UIs are so much more efficient than long-form text.

guybedo

5 months ago

I don't think the Agents/LLM become the UI, they are going to be the orchestrators, but a well though UI is always going to be more useful than having to chat/write words so that an agent can help you.

It's gonna be: reusable saas components + ai orchestrator + specialized UI

On a related note, there's probably gonna be an extinction level event in the software industry as there's no software moat anymore.

When every application, every feature, every function can be replicated/reproduced by another company in a matter of minutes / hours using AI tools, you don't have a moat anymore.

alex_young

5 months ago

This reminds me of the blockchain will make everything obsolete sensation of yesteryear.

Why will businesses trust a black box that claims to make good decisions (most of the time) when they have existing human relationships they have vetted, measured, and know the ongoing costs and benefits of?

If the reason is humans are expensive, I have news for you. We've had robotics for around 100 years and the humans are still much cheaper than the robots. Adding a bunch of graphics cards and power plants to the mix doesn't seem to change that equation in a positive direction.

caspper69

5 months ago

Continuing on with my "old man yells at cloud" meme of late, here's my hot take:

So let me get this straight- we are going to train AI models to perform screen recognition of some kind (so it can ascertain layout and detect the "important" ui elements), and additionally ask that AI to OCR all text on the screen so it has some hope of being able to follow some natural language instructions (OCR being a task which, as a HN thread a day or two ago pointed out, AI is exceedingly bad at), and then we're going to be able to tell this non-deterministic prediction engine what we want to do with our software, and it's just going to do it?

Like Homer Simpson's button pressing birdie toy? :smackshead:

Why do I have reservations about letting a non-deterministic AI agent run my software?

Why not expose hooks in some common format for our software to perform common tasks? We could call it an "application programming interface". We might even insist on some kind of common data interchange format. I hear all the cool people are into EBCDIC nowadays.

Then we could build a robust and deterministic tool to automate our workflows. It could even pass structured data between unrelated applications in a secure manner. Then we could be sure that the AI Agent will hit the "save the world" button instead of the "kill all humans" button 100% of the time.

On a serious note, we should study various macro recording implementations, to at least have a baseline of what people have been successfully doing for 40+ odd years to automate their workflows, and then come up with an idea that doesn't involve investing in a new computer, gpu, and slowly boiling the oceans.

This reeks of a solution in search of a problem. And the solution has the added benefit of being inefficient and unreliable. But, people don't get billion dollar valuations for macro recorders.

Is this what they meant by "worse is better"?

Edit: and for the love of FSM, please do not expose any new automation APIs to the network.

rglover

5 months ago

Thank you. My thoughts exactly. Specifically the "you want me to trust mission-critical business logic to a Frankenstein mess of non-deterministic 'agents'?!"

The scariest part is, as this advances, the level of disasters we're likely to see will at best be bankrupt corporations, and at worst, people being hurt/killed (depending on how carelessly these tools are integrated into mission critical systems).

svilen_dobrev

5 months ago

check https://news.ycombinator.com/item?id=42974429 from few days ago.. the OP was re-advertising OAUth, but another idea might be, that new kind of interfaces are needed - application agentic interfaces - standing in middle between APP(Programming) (too detailed) and AHI(Human) screen/forms (too human targeted). IMO.

caspper69

5 months ago

I propose the Open Agent Interface.

We can call it OpenAI.

I'll see myself out.

svieira

5 months ago

We could also call it the Open Agent Permissions Interface or OpenAPI for short.

heroprotagonist

5 months ago

*Open as in, many people can pay to use it. Not ALL, that would be ridiculous. And certainly not forever. Once it starts to work correctly, no more buzz will be needed to drive investment and you'll then be slowly cut off or price starved out of most functionality.

Terr_

5 months ago

> Like Homer Simpson's button pressing birdie toy? :smackshead:

This comparison is especially apt, given that one of the main use-cases for LLMs is the same kind of... well, fraud: To give the illusion that you did the work of understanding or reviewing something, but actually just (smart-)phoning it in.

In one Apple iPhone advertisement, the famous actor is asked by their agent what they think of a script. They didn't read it, so they ask the LLM-assistant to sum it up in couple sentences... and then they tell their agent it sounds good.

caspper69

5 months ago

I think my quip about the toy flew over a lot of heads, so I appreciate that someone got it.

The reality is that most applications and websites don’t expose enough context about the what of what you’re actually doing for AIs to be able to meaningfully infer from natural language the steps required to complete a given task.

We humans are very good at filling in the blanks based on if we’re working in Photoshop or VS Code or Excel. We infer a lot of context from the specific files we’re working on or the particular client or even the files’ organization within the file system, or even what month or day it is.

I am skeptical that models will be able to replicate a complex workflow when there’s very little in the way of labels and UI controls even visible.

I know a weekly spreadsheet from a monthly and quarterly, etc. I know the minutiae about which options to use to generate the specific source reports, etc.

Workflows can be quite complex, no matter your role.

I mean I can just see it now: gift receipts being sent to the recipient before their birthday, internal draft proposals prematurely sent to clients, mixing up clients or commingling their data, overwriting or losing data; this whole thing just screams disaster. And I’m not even thinking about people involved with safety, or finance, or legal/regultory, or medical. Law enforcement?

This kind of thing can be done properly with well defined interfaces, common standards, and reasonable and prudent guardrails.

But it won’t be. It’ll be YOLOed on a paper thin training budget and it’ll be like your own little personal chaos monkey on ketamine.

Terr_

5 months ago

> I am skeptical that models will be able to replicate a complex workflow when there’s very little in the way of labels and UI controls even visible.

Also, at least from the perspective of internal business software, a significant part of it is trying to get people to know what they're doing. There's a domain-model that's being taught at the same time, and it's institutionally-important that they are cognizant and aware of what they're agreeing to. Together this tends to lead to an arrangement of multiple screens, confirmation boxes, etc.

Many individuals instinctively dislike this, and it'll be their one of their first choices for "let my LLM assistant do it."

> I mean I can just see it now

Before these LLMs, I felt like Idiocracy had become politically prescient, but now it feels like I actually see a technology that could enable it.

caspper69

5 months ago

Brawndo is coming- it's got electrolytes and IT'S WHAT PLANTS CRAVE!

Life imitates art indeed.

I am sympathetic to wanting to automate complex workflows. Hell, I'm sympathetic to wanting to automate simple workflows. In fact, I bitch about the stupidity of the things I do at least once a week (no, you see, I take the numbers that show on this monitor, and I type them into a box on that monitor; why no cut & paste? faster to re-type the numbers; sigh).

But people provide context. Sure, an AI might tell you utility costs were up last quarter, but they won't know it was because of a water leak that went unnoticed and tripled the bill. Or it will tell you that wages were up, but not that it was because Bill from Operations had hernia surgery and we had to bring on a temp for 2 months. And it certainly won't tell you that Jim's back on the sauce, so we should probably begin putting out feelers for a new salesman.

So much of what business does is tracking metrics, yes, but the numbers never tell the whole story. There's always a backstory. Things that just can't be captured in raw data and hence can't be summarized by an AI. And AIs can't keep the ship sailing. Every small business has the guy/girl that does all the little things for everyone that absolutely holds the whole damn thing together. I'm not a BigCorp guy, but I imagine most departments are similar.

How about customer feedback? How can a model distill valuable (actionable) meaning from disparate communication mediums other than superficial high-level conclusions?

Expectations are just not realistic right now. There's going to be a lot of disappointment.

llm_trw

5 months ago

>So let me get this straight- we are going to train AI models to perform screen recognition of some kind (so it can ascertain layout and detect the "important" ui elements), and additionally ask that AI to OCR all text on the screen so it has some hope of being able to follow some natural language instructions (OCR being a task which, as a HN thread a day or two ago pointed out, AI is exceedingly bad at), and then we're going to be able to tell this non-deterministic prediction engine what we want to do with our software, and it's just going to do it?

AI is amazing at OCR, we've had tesseract ocr for 40 years and if you read the fine manual it has essentially a 0% error rate per character.

OCR on VLMs is terrible.

For some reason consistent x-heights between 10 to 30 pixels with guaranteed mono-column layout is not something venture capitalists get excited about, and as a result I'm not the founder of a unicorn.

caspper69

5 months ago

Ok, I will need to work on my reading comprehension skills.

That being said, I thought the purpose of OCR was to take text from a non-digital source and make it digital.

Why should we have to OCR something that exists already in a perfectly interchangeable digital format already?

mdaniel

5 months ago

> Why should we have to OCR something that exists already in a perfectly interchangeable digital format already?

I'm with you in spirit, but in this specific context I think it's because the alternative would require the ~~LLM~~ Agent to be an HTML parser, or be bright enough to write themselves a Scrapy crawler. I suspect folks decided it's cheaper (by some metric) to just use the normal browser machinery to render 45MB worth of HTML, JS, CSS, Cloudflare Spooge, etc into a PNG and then rip the actual content out of that

I was also going to offer as a counterexample: PDF

caspper69

5 months ago

Not everyone does their work in a web browser.

And even still, you don’t have to parse raw markup to grab properties from DOM elements. That could be handled by a browser plugin coupled with some some user guided training.

PDF is another beast entirely. I think there’s already a whole thread about that going on now. I’m going to zip my lips. I’m still waiting on Adobe to return my call from two years ago inquiring about the licensing costs of their parsing library for a small shop. Good thing I wasn’t relying on them to get that project done, and thank goodness for oss.

utf_8x

5 months ago

So is "AI Agents" something the community has settled on or is this a Google-ism? I remember people arguing about this some time ago with no definitive answer.

klabb3

5 months ago

I fancy the old fashioned term ”middleware” myself. But given it is, in fact, the current year, I suspect we’re going to have to accept ”agents” for the time being.

deepsquirrelnet

5 months ago

In my experience, autonomous tools are not as successful as ones that are built to postulate about and get confirmation of the user’s intent. I think there’s a lot of promise for agents that are built to be controlled by skilled operators.

Autonomy is just more sexy, but in my opinion, it’s a poor design direction for a lot of applications.

sbmthakur

5 months ago

I wonder how we will train Customer Support to tackle issues faced by LLMs. LLMs can already do basic Customer Support. But stuff like understanding bugs and deciding if they should escalate things to engineers feels like a hard thing for an LLM.

ceejayoz

5 months ago

> deciding if they should escalate things to engineers feels like a hard thing for an LLM

Especially since most attempts will have a "under no circumstances should you voluntarily involve a human" in the prompt.

dkkergoog

5 months ago

[dead]

ashu1461

5 months ago

I think the importance to things like user interface, good design are still going to remain just their applications will change to the AI interaction layer / control layer which are mentioned in the blog.

datadrivenangel

5 months ago

SaaS will become the wordpress plugin equivalent for Agent platforms.

dkkergoog

5 months ago

[dead]

BSOhealth

5 months ago

UX is already working on this. AI as a first-class persona that can be deliberately designed for and accommodated. APIs and protocols are way too strict. Think HTML and black Times New Roman on white backgrounds from the old days. Clear information (text) and activation options (hyperlink) are all it needs.

nickdothutton

5 months ago

Ah yes, we are back to the 90s where we are going to have agents taking care of everything for us. All we are missing is Andersen Consulting to sell this to the CEO.

asdev

5 months ago

who has productionized an agent in a setting where there is a low margin for error? I would love to know

reportgunner

5 months ago

I figure if that guy exists he's swimming in his money vault full of coins instead of reading things online.

nonchalantsui

5 months ago

Great doc. I wonder when we’ll be getting an OS that dedicates itself to Agents.

charliebwrites

5 months ago

I’ll bet you a bag of peanuts that some SaaS company names their next AI Product “AgentOS”

2 bags of peanuts if the actual product isn’t an OS and barely passes as AI

nonchalantsui

5 months ago

There are already so many things called AgentOS, and none of them are an OS! So you would be right on the mone-- peanuts.

turnsout

5 months ago

We need a simple open-source protocol which includes authentication and ability for agents to make payments. Essentially what you want is the ability for an agent to take a core action (as the article mentions, like adding a record to a CRM).

I fundamentally believe that human-oriented web apps are not the answer, and neither is REST. We need something purpose-built.

The challenge is, it has to be SIMPLE enough for people to easily implement in one day. And it needs to be open source to avoid the obvious problems with it being a for-profit enterprise.

Something1234

5 months ago

This is the same dumb problem as always. Are you who you say you are and are you allowed to do such and such action?

There’s existing solutions but everything is its own special snowflake. Oauth is a lie, sso sometimes works. But sso doesn’t provide a differentiation between my employee and their broken script.

turnsout

5 months ago

Public key encryption solves this entirely