Hacker Newsnew | past | comments | ask | show | jobs | submit | akulbe's commentslogin

Which means that the bubble pop will be that much worse too.

Do subscriptions make the ads go away? If not, it's hard to see much value proposition in them.

Framework.


Mac-to-Android texting with the Messages app is broken for me, as of macOS and iOS 26.5

I'm just curious to know if other users have experienced this, before I go to extreme troubleshooting steps.

I can send direct from the iPhone, but the forwarding from the Mac feature is broken for messages destined for Android devices.

I spent a couple hours on the phone with Apple Support and did extensive troubleshooting, and they finally told me I should reinstall macOS. Seems pretty extreme, to me.


If you try to text an Android number from your Mac and the "Send" button is grayed out or the message fails, it usually means a setting called Text Message Forwarding is turned off on your iPhone.


Checked that. It's not off.


Any chance you'd be willing to talk further about your setup? I have 2 x 3090s in a local machine, and I'm still left with questions about how best to use stuff locally.


You can only run heavily quantized models on all 3/4/5 rtx gpus (with 32gb or less vram) - and you probably want moe versions like Qwen 35b for this to run at speed somewhat comparable to Claude. It’s still not there to be honest but getting there. Personally I mess around with llama.cpp on m5 max with 128gb - it’s a decent setup to try various medium sized things, and runs llms surprisingly well without quantization, at least the moe models.


Two 3090s is 48GB, so it's possible to run the 6-bit quantization comfortably, which is fine. It doesn't start to get notably dumber until lower than that. It won't be as fast as a hosted model, but dual 3090s will be comfortably fast for interactive use with the MoE version and not terrible to use with the dense model. I run the dense model at 8 bits on my dual Radeon V620 desktop machine, which I think would be slower than two 3090s, or at least not notably faster.


Have you done comparisons with 4 bit and seen a noticeable difference for coding tasks?


No, I've just seen benchmarks showing most models start degrading around 4-5 bits. That's not to say they become useless, just that down to about 6-bits (with careful hybrid quantizations like unsloth where some of the layers aren't quantized or are quantized at higher bit depths) the quality isn't measurably degraded, but below that there are measurable differences in performance.

People report good results from DeepSeek V4 Flash at 2 bits (the DwarfStar 4 folks are doing it, and I've tried it on my Strix Halo, but it's too slow to be usable, so I haven't bothered to figure out if it's actually smart enough to use for anything).

Anyway, it's obvious models have to degrade in terms of knowledge, at any quantization, even though it may not show up clearly on benchmarks until lower. If you halve the size of the data available, it necessarily loses information about the world.


One of the things I'm wondering about is what I'm missing for $LLM to create files on the local FS like Claude and Codex do. What I see instead is stuff just printing to stdout, rather than files on the filesystem.

What am I missing?


You're missing an agent. The model uses tool calls to interact with the filesystem, commands on the system, optionally search (you need a search MCP server, like Brave or Exa, and API key), etc.

I usually use the Zed Agent built into Zed editor for self-hosted models, but you could use Pi, OpenCode, Hermes, Claude Code, etc. there are many, many, agents.


The model just predicts text, Claude Code etc parse the output and do the actual file creation (or run shell commands that do it). If you have Claude Code installed look in ~/.claude/projects/... and you can see the transcripts of your actual sessions, or install Mini-SWE-Agent and play with that to get a feel for what's going on.


The data I've seen is stuff like the KL Divergence comparisons that Unsloth does which show something but not clearly whether there's an observable or significant difference in task performance.


How is that machine for local inference? It's a serious consideration for me, but getting to hear more from folks that already have it would be helpful.


It’s great laptop to mess around with llms, it won’t replace claude opus or even sonnet.


Follow the money.


Sorry... it's DOA for me. Just installed it, and I can't even open it. Immediately crashes on open. Repeatedly.

https://gist.github.com/akulbe/26b71df4a3fd069dd71824f000c9a...


OP - I don't know if there's anything else I can do to assist, other than the bug report output I put in the gist, but let me know. I'm happy to help.


Thanks for the crash report! It's the first time I have ever received it on GitHub gist. I expect it to appear in Xcode Organiser.

Could you please tell me about your desktop environment? Which macOS OS are you using?


That's all at the very bottom of the crash report.


Is this a direct shot at things like OpenClaw, or am I reading it wrong?


They even block Claude Code of you've modified it via tweakcc. When they blocked OpenCode, I ported a feature I wanted to Claude Code so I could continue using that feature. After a couple days, they started blocking it with the same message that OpenCode gets. I'm going to go down to the $20 plan and shift most of my work to OpenAI/ChatGPT because of this. The harness features matter more to me than model differences in the current generation.


Opencode as well. Folks have been getting banned for abusing the OAuth login method to get around paying for API tokens or whatever. Anthropic seems to prefer people pay them.


its not that innocent.

a 200 dollar a month customer isn't trying to get around paying for tokens, theyre trying to use the tooling they prefer. opencode is better in a lot of ways.

tokens get counted and put against usage limits anyway, unless theyre trying to eat analytics that are CC exclusive they should allow paying customers to consume to the usage limits in however way they want to use the models.


Anthropic is offering a steep discount in their plans. I highly doubt they want you using it in a harness where you can trivially switch away when someone else releases a better model


Funny, because you CAN switch Claude Code to other providers and models easily.

Anthropic is just a deeply "mis-dev-anthropic" company.


> opencode is better in a lot of ways.

I use opencode everyday; can you explain how claudecode is much different and what it lacks?


> they should allow paying customers to consume to the usage limits in however way they want to use the models.

I think I agree, but it's their business to run however they like. They have competition if we don't like it.


A $200/m max subscriber using OpenCode and not wanting to use API keys with pay-per-token pricing is very clearly trying to get around paying for tokens.


Is there any limits to that users 200/month? Why should they not be able to use the limits to the extent from other tools?

If openclaw chews my 200/month up in 15 days... I don't get more requests for free


There is no monthly limit, it (currently) is a weekly and 5-hourly limit. If they allow anyone to use any tool with their subscription service, you could have a system (like OpenClaw) which involves 0 human interaction and is constantly consuming 100% of your token limit, then waiting until limits reset to do it all over again. It seems fairly clear that Anthropic is probably losing money on such usage patterns.

Once again: you can use API keys and pricing to get UNLIMITED usage whenever you want. If you are choosing to pay for a subscription instead, it is because Anthropic is offering those subscriptions at a much better value-per-token. They are not offering such a subscription out of the goodness of their heart.


There are 4 weeks in a month.

4 periods of weekly limits, is a monthly limit.


That's... not how that works. Might as well say Anthropic has a 63 day limit (cuz that's 9 weeks).

The point of the first half of my comment is that you cannot chew through your tokens in 15 days, because although the billing cycle is monthly, the limits are not.


4 weeks * 12 months = 48 weeks in a year * 7 days in a week = 336 days per year - close enough :)


For sure, yes. They already added attempts to block opencode, etc.


I wonder if it has to do with Grok somehow. They had a suspiciously high reputation until they just binarily didn't, after Anthropic said they did something.


No, the reason it's happening is because they must be vibe coding! :P


[flagged]


No because you missed the joke.


I still have my Windows 11 machine, but I haven't booted into it in a couple months now.

The "Windows is going to be an agentic OS" announcement was the last straw.

Linux and Mac it is.


I would enjoy hooking up Claude to KDE with voice control and audio feedback, but am 100% on board with that it should be 100% the user deciding to go for that folly.


I mean, that would be a fun experiment on a VM, but I would not trust it directly on my work station, not the least because of privacy. It might might do a mass-mass-rm-rf

https://x.com/tskulbru/status/2015148189897101622


alias mass=sudo


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: