Mac-to-Android texting with the Messages app is broken for me, as of macOS and iOS 26.5
I'm just curious to know if other users have experienced this, before I go to extreme troubleshooting steps.
I can send direct from the iPhone, but the forwarding from the Mac feature is broken for messages destined for Android devices.
I spent a couple hours on the phone with Apple Support and did extensive troubleshooting, and they finally told me I should reinstall macOS. Seems pretty extreme, to me.
If you try to text an Android number from your Mac and the "Send" button is grayed out or the message fails, it usually means a setting called Text Message Forwarding is turned off on your iPhone.
Any chance you'd be willing to talk further about your setup? I have 2 x 3090s in a local machine, and I'm still left with questions about how best to use stuff locally.
You can only run heavily quantized models on all 3/4/5 rtx gpus (with 32gb or less vram) - and you probably want moe versions like Qwen 35b for this to run at speed somewhat comparable to Claude. It’s still not there to be honest but getting there. Personally I mess around with llama.cpp on m5 max with 128gb - it’s a decent setup to try various medium sized things, and runs llms surprisingly well without quantization, at least the moe models.
Two 3090s is 48GB, so it's possible to run the 6-bit quantization comfortably, which is fine. It doesn't start to get notably dumber until lower than that. It won't be as fast as a hosted model, but dual 3090s will be comfortably fast for interactive use with the MoE version and not terrible to use with the dense model. I run the dense model at 8 bits on my dual Radeon V620 desktop machine, which I think would be slower than two 3090s, or at least not notably faster.
No, I've just seen benchmarks showing most models start degrading around 4-5 bits. That's not to say they become useless, just that down to about 6-bits (with careful hybrid quantizations like unsloth where some of the layers aren't quantized or are quantized at higher bit depths) the quality isn't measurably degraded, but below that there are measurable differences in performance.
People report good results from DeepSeek V4 Flash at 2 bits (the DwarfStar 4 folks are doing it, and I've tried it on my Strix Halo, but it's too slow to be usable, so I haven't bothered to figure out if it's actually smart enough to use for anything).
Anyway, it's obvious models have to degrade in terms of knowledge, at any quantization, even though it may not show up clearly on benchmarks until lower. If you halve the size of the data available, it necessarily loses information about the world.
One of the things I'm wondering about is what I'm missing for $LLM to create files on the local FS like Claude and Codex do. What I see instead is stuff just printing to stdout, rather than files on the filesystem.
You're missing an agent. The model uses tool calls to interact with the filesystem, commands on the system, optionally search (you need a search MCP server, like Brave or Exa, and API key), etc.
I usually use the Zed Agent built into Zed editor for self-hosted models, but you could use Pi, OpenCode, Hermes, Claude Code, etc. there are many, many, agents.
The model just predicts text, Claude Code etc parse the output and do the actual file creation (or run shell commands that do it). If you have Claude Code installed look in ~/.claude/projects/... and you can see the transcripts of your actual sessions, or install Mini-SWE-Agent and play with that to get a feel for what's going on.
The data I've seen is stuff like the KL Divergence comparisons that Unsloth does which show something but not clearly whether there's an observable or significant difference in task performance.
How is that machine for local inference? It's a serious consideration for me, but getting to hear more from folks that already have it would be helpful.
They even block Claude Code of you've modified it via tweakcc. When they blocked OpenCode, I ported a feature I wanted to Claude Code so I could continue using that feature. After a couple days, they started blocking it with the same message that OpenCode gets. I'm going to go down to the $20 plan and shift most of my work to OpenAI/ChatGPT because of this. The harness features matter more to me than model differences in the current generation.
Opencode as well. Folks have been getting banned for abusing the OAuth login method to get around paying for API tokens or whatever. Anthropic seems to prefer people pay them.
a 200 dollar a month customer isn't trying to get around paying for tokens, theyre trying to use the tooling they prefer. opencode is better in a lot of ways.
tokens get counted and put against usage limits anyway, unless theyre trying to eat analytics that are CC exclusive they should allow paying customers to consume to the usage limits in however way they want to use the models.
Anthropic is offering a steep discount in their plans. I highly doubt they want you using it in a harness where you can trivially switch away when someone else releases a better model
A $200/m max subscriber using OpenCode and not wanting to use API keys with pay-per-token pricing is very clearly trying to get around paying for tokens.
There is no monthly limit, it (currently) is a weekly and 5-hourly limit. If they allow anyone to use any tool with their subscription service, you could have a system (like OpenClaw) which involves 0 human interaction and is constantly consuming 100% of your token limit, then waiting until limits reset to do it all over again. It seems fairly clear that Anthropic is probably losing money on such usage patterns.
Once again: you can use API keys and pricing to get UNLIMITED usage whenever you want. If you are choosing to pay for a subscription instead, it is because Anthropic is offering those subscriptions at a much better value-per-token. They are not offering such a subscription out of the goodness of their heart.
That's... not how that works. Might as well say Anthropic has a 63 day limit (cuz that's 9 weeks).
The point of the first half of my comment is that you cannot chew through your tokens in 15 days, because although the billing cycle is monthly, the limits are not.
I wonder if it has to do with Grok somehow. They had a suspiciously high reputation until they just binarily didn't, after Anthropic said they did something.
I would enjoy hooking up Claude to KDE with voice control and audio feedback, but am
100% on board with that it should be 100% the user deciding to go for that folly.
I mean, that would be a fun experiment on a VM, but I would not trust it directly on my work station, not the least because of privacy. It might might do a mass-mass-rm-rf
reply