noman-land
5 hours ago
For anyone who hasn't tried local models because they think it's too complicated or their computer can't handle it, download a single llamafile and try it out in just moments.
https://future.mozilla.org/builders/news_insights/introducin...
https://github.com/Mozilla-Ocho/llamafile
They even have whisperfiles now, which is the same thing but for whisper.cpp, aka real-time voice transcription.
You can also take this a step further and use this exact setup for a local-only co-pilot style code autocomplete and chat using Twinny. I use this every day. It's free, private, and offline.
https://github.com/twinnydotdev/twinny
Local LLMs are the only future worth living in.
vunderba
3 hours ago
If you're gonna go with a VS code extension and you're aiming for privacy, then I would at least recommend using the open source fork VS Codium.
unethical_ban
3 hours ago
It is true that VS Code has some non-optional telemetry, and if VS Codium works for people, that is great. However, the telemetry of VSCode is non-personal metrics, and some of the most popular extensions are only available with VSCode, not with Codium.
wkat4242
3 hours ago
> and some of the most popular extensions are only available with VSCode, not with Codium
Which is an artificial restriction from MS that's really easily bypassed.
Personally I don't care whether the telemetry is identifiable. I just don't want it.
noman-land
3 hours ago
How is it bypassed?
wkat4242
2 hours ago
There's a whitelist identifier that you can add bundle IDs to, to get access to the more sensitive APIs. Then you can download the extension file and install it manually. I don't have the exact process right now but just Google it :)
qwezxcrty
an hour ago
From the documentation (https://code.visualstudio.com/docs/getstarted/telemetry ) it seems there is a supported way to completely turn off telemetry. Is there something else in VSCode that doesn't respect this setting?
ComputerGuru
7 minutes ago
Do you know if whisperfile is akin to whisper or the much better whisperx? Does it do diarization?
kaoD
3 hours ago
Well this was my experience...
User: Hey, how are you?
Llama: [object Object]
It's funny but I don't think I did anything wrong?AlienRobot
2 hours ago
2000: Javascript is webpages.
2010: Javascript is webservers.
2020: Javascript is desktop applications.
2024: Javascript is AI.
evbogue
2 hours ago
From this data we must conclude that within our lifetimes all matter in the universe will eventually be reprogrammed in JavaScript.
mnky9800n
2 hours ago
I'm not sure I want to live in that reality.
mortenjorck
an hour ago
If the simulation hypothesis is real, perhaps it would follow that all the dark matter and dark energy in the universe is really just extra cycles being burned on layers of interpreters and JIT compilation of a loosely-typed scripting language.
anotherjesse
37 minutes ago
AlienRobot
an hour ago
It's fine, it will be Typescript.
_kidlike
4 hours ago
ryukoposting
an hour ago
Not only are they the only future worth living in, incentives are aligned with client-side AI. For governments and government contractors, plumbing confidential information through a network isn't an option, let alone spewing it across the internet. It's a non-starter, regardless of the productivity bumps stuff like Copilot can provide. The only solution is to put AI compute on a cleared individual's work computer.
wkat4242
3 hours ago
Yeah I set up a local server with a strong GPU but even without that it's ok, just a lot slower.
The biggest benefits for me are the uncensored models. I'm pretty kinky so the regular models tend to shut me out way too much, they all enforce this prudish victorian mentality that seems to be prevalent in the US but not where I live. Censored models are just unusable to me which includes all the hosted models. It's just so annoying. And of course the privacy.
It should really be possible for the user to decide what kind of restrictions they want, not the vendor. I understand they don't want to offer violent stuff but 18+ topics should be squarely up to me.
Lately I've been using grimjim's uncensored llama3.1 which works pretty well.
wkat4242
an hour ago
@the_gorilla: I don't consider bdsm to be 'degenerate' nor violent, it's all incredibly consensual and careful.
It's just that the LLMs trigger immediately on minor words and shut down completely.
threecheese
an hour ago
Any tips you can give for like minded folks? Besides grimjim (checking it out).
the_gorilla
2 hours ago
If you get to act out degenerate fantasies with a chatbot, I also want to be able to get "violent stuff" (which is again just words).
zelphirkalt
3 hours ago
Many setup rely on Nvidia GPUs, Intel stuff, Windows or other stuff, that I would rather not use, or are not very clear about how to set things up.
What are some recommendations for running models locally, on decent CPUs and getting good valuable output from them? Is that llama stuff portable across CPUs and hardware vendors? And what do people use it for?
noman-land
3 hours ago
llamafile will run on all architectures because it is compiled by cosmopolitan.
https://github.com/jart/cosmopolitan
"Cosmopolitan Libc makes C a build-once run-anywhere language, like Java, except it doesn't need an interpreter or virtual machine. Instead, it reconfigures stock GCC and Clang to output a POSIX-approved polyglot format that runs natively on Linux + Mac + Windows + FreeBSD + OpenBSD + NetBSD + BIOS with the best possible performance and the tiniest footprint imaginable."
I use it just fine on a Mac M1. The only bottleneck is how much RAM you have.
I use whisper for podcast transcription. I use llama for code complete and general q&a and code assistance. You can use the llava models to ingest images and describe them.
threecheese
3 hours ago
Have you tried a Llamafile? Not sure what platform you are using. From their readme:
> … by combining llama.cpp with Cosmopolitan Libc into one framework that collapses all the complexity of LLMs down to a single-file executable (called a "llamafile") that runs locally on most computers, with no installation.
Low cost to experiment IMO. I am personally using MacOS with an M1 chip and 64gb memory and it works perfectly, but the idea behind this project is to democratize access to generative AI and so it is at least possible that you will be able to use it.narrator
3 hours ago
With 64GB can you run the 70B size llama models well?
threecheese
an hour ago
I should have qualified the meaning of “works perfectly” :) No 70b for me, but I am able to experiment with many quantized models (and I am using a Llama successfully, latency isn’t terrible)
credit_guy
2 hours ago
No, you can't. I have 128 GB and a 70B llamafile is unusable.
distances
2 hours ago
I'm using Ollama with an AMD GPU (7800, 16GB) on Linux. Works out of the box. Another question is then if I get much value out of these local models.
wkat4242
3 hours ago
Not really. I run ollama on an AMD Radeon Pro and it works great.
For tooling to train models it's a bit more difficult but inference works great on AMD.
My CPU is an AMD Ryzen and the OS Linux. No problem.
I use OpenWebUI as frontend and it's great. I use it for everything that people use GPT for.
privacyis1mp
3 hours ago
I built Fluid app exactly with that in mind. You can run local AI on mac without really knowing what an LLM/ollama is. Plug&Play.
Sorry for the blatant ad, though I do hope it's useful for some ppl reading this thread: https://getfluid.app
twh270
3 hours ago
I'm interested, but I can't find any documentation for it. Can I give it local content (documents, spreadsheets, code, etc.) and ask questions?
privacyis1mp
2 hours ago
> Can I give it local content (documents, spreadsheets, code, etc.) It's coming roughly in December (may be sooner).
Roadmap is following:
- October - private remote AI (when you need smarter AI than your machine can handle, but don't want your data to be logged or stored anywhere)
- November - Web search capabilities (so the AI will be capable of doing websearch out of the box)
- December - PDF, docs, code embedding. 2025 - tighter MacOS integration with context awareness.
twh270
an hour ago
Oh awesome, thank you! I will check back in December.
heyoni
4 hours ago
Isn’t there also some Firefox AI integration that’s being tested by one dev out there? I forgot the name and wonder if it got any traction.