Hacker Newsnew | past | comments | ask | show | jobs | submit | dokka's commentslogin

I also did this a few months ago using a custom MCP server I built for the Alpaca API, the yfinance MCP server, and a reddit MCP server, and the "sequential thinking" mcp server. I hade claude write a prompt that combined them all together starting with checking r/pennystocks for any news, looking up the individual ticker symbols with alpaca and yfinance, checking account balance and making a trade only if a very particular set of criteria was met. I used claude code instead of desktop so that I could run it as a cron job, and it all works! I mostly built it to see if I could, not for any financial gain. I had it paper trading for a few months and it made a 2% profit on 100k. I really think someone that knows more about trading could do quite well with a setup like this, but it's not for me.


Did you benchmark it against holding the same value in S&P 500 or similar ETF over the same period?


they said their paper account did 2% over a few months, which is not beating the s&p 500, and is probably why they said "someone could make money off this, but not me"


I'm curious because 2% over the last few months while the S&P 500 is tanking might be interesting, but doing worse than the S&P 500 over the same period is less so.

Hell, that's lower than inflation.

It may be no better than flipping coins.


This year? 2% is most certainly is outperforming the S&P500.


eh, depends on the two months in question.


eh depending on what he means by 'specific set of criteria'. The vol might be lower than benchmark (i doubt it since they are pennystocks)

Pretty sure OP sharpe ratio is higher than benchmark for the last two months.


Something like Russell 2k would be a better comparison if GP is trading pennystocks.


I mean, sounds cool, but without backtesting that 2% could be just noise.


It's good at taking code of large, complex libraries and finding the most optimal way to glue them together. Also, I gave it the code of several open source MudBlazor components and got great examples of how they should be used together to build what I want. Sure, Grok 3 and Sonnet 3.7 can do that, but the GPT 4.5 answer was slightly better.


> Sure, Grok 3 and Sonnet 3.7 can do that, but the GPT 4.5 answer was slightly better.

Sonnet 3.7: $3/million input tokens, $15/million output tokens [0]

GPT-4.5: $75/million input tokens, $150/million output tokens [1]

if it's 10-25x the cost, I would expect more than "slightly better"

0: https://www.anthropic.com/news/claude-3-7-sonnet

1: https://openai.com/api/pricing/


It really depends on how much it actually costs for a task though. 10x more of almost nothing isn't important.


there's a $1 widget and a slightly better $10 widget.

if you're only buying 1 widget, you're correct that the price difference doesn't matter a whole lot.

but if you're buying 10 widgets, the total cost of $10 vs $100 starts to matter a bit more.

say you run a factory that makes and sells whatchamacallits, and each whatchamacallit contains 3 widgets as sub-components. that line item on your bill of materials can either be $3, or $30. that's not an insignificant difference at all.

for one-off personal usage, as a toy or a hobby - "slightly better for 10x the price" isn't a huge deal, as you say. for business usage it's a complete non-starter.

if there was a cloud provider that was slightly better than AWS, for 10x the price, would you use it? would you build a company on top of it?


It's unfortunate it is named 4.5 -- it is next generation scale, and it's a 1.0 of next-generation scale.

Sonnet is on its 3rd iteration, i.e. has considerably more post-training, most notably, reasoning via reinforcement learning.


It's not really the beginning (1.0) of anything - more like the end given that OpenAI have said this'll be the last of their non-reasoning models - basically the last scale-up pre-training experiment.

As far as the version number, OpenAI's "Chief Research Officier" Mark Chen said, on Alex Kantrowitz's YouTube channel, that it "felt" like a 4.5 in terms of level of improvement over 4.0.


That's a lot of other stuff, and you express disagreement.

I'm sure we both agree it's the first model at this scale, hence the price.

> It's not really the beginning (1.0) of anything

It is a LLM w/o reasoning training.

Thus, the public decision to make 5.0 = 4.5 + reasoning.

> "more like the end...the last scale-up pre-training experiment."

It won't be the last scaled-up pre-training model.

I assume you mean, what I expect, and you go on to articulate: it'll be last scaled-up-pre-training-without-reasoning-training-too-relesed-publicly model.

As we observe, the value to benchmarks of, in your parlance, scaled-down pretraining, with reasoning training, is roughly the same as scaled-up pre-training without reasoning training.


> Yes it is. It's the first model at this scale.

Is it? Bigger than Grok 3? How do you know - just because it's expensive?


At some point, I have to say to myself: "I do know things."

I'm not even sure what the alternative theory would be: no one stepped up to dispute OpenAI's claim that it is, and X.ai is always eager to slap OpenAI around.

Let's say Grok is also a pretraining scale experiment. And they're scared to announce they're mogging OpenAI on inference cost because (some assertion X, which we give ourselves the charity of not having to state to make an argument).

What's your theory?

Steelmanning my guess: The price is high because OpenAI thinks they can drive people to Model A, 50x the cost of Model B.

Hmm...while publicly proclaiming, it's not worth it, even providing benchmarks that Model A gets the same scores 50x cheaper?

That doesn't seem reasonable.


OpenAI have apparently said that GPT 4.5 has a knowledge cutoff date of October 2023, and their System Card for it says "GPT 4.5 is NOT a frontier model" (my emphasis).

It seems this may be an older model that they chose not to release at the time, and are only doing so now due to feeling pressure to release something after recent releases by DeepSeek, Grok, Google and Anthropic. Perhaps they did some post-training to "polish the turd" and give it the better personality that seems to be one of it's few improvements.

Hard to say why it's so expensive - because it's big and expensive to serve, or for some marketing/PR reason. It seems that many sources are confirming that the benefits of scaling up pre-training (more data, bigger model) are falling off, so maybe this is what you get when you scale up GPT 4.0 by a factor of 10x - bigger, more expensive, and not significantly better. Cost to serve could also be high because, not intending to release it, they have never put the effort in to optimize it.


See, you get it: if we want to know nothing, we can know nothing.

For all we know, Beezlebub Herself is holding Sam Altman's conciousness captive at the behest of Nadella. The deal is Sam has to go "innie" and jack up OpenAI costs 100x over the next year so it can go under and Microsoft can get it all for free.

Have you seen anything to disprove that? Or even casting doubt on it?


Versions numbers for LLMs don't mean anything consistent. They don't even publicly announce at this point which models are built from new base models and which aren't. I'm pretty sure Claude 3.5 was a new set of base models since Claude 3.

What do mean by "it's a 1.0" and "3rd iteration"? I'm having trouble parsing those in context.


If Claude 3.5 was a base model*, 3.7 is a third iteration** of that model.

GPT-4.5 is a 1.0, or, the first iteration of that model.

* My thought process when writing: "When evaluating this, I should assume the least charitable position for GPT-4.5 having headroom. I should assume Claude 3.5 was a completely new model scale, and it was the same scale as GPT-4.5." (this is rather unlikely, can explain why I think that if you're interested)

** 3.5 is an iteration, 3.6 is an iteration, 3.7 is an iteration.


How do you feed them large code bases usually?


Use an AI-supporting editor like Cursor, or GitHub CoPilot, or perhaps Sonnet 3.7's GitHub integration.


Ah Operator. This synth is so deep. Not only is it a fantastic FM synth, but it does subtractive synthesis well too. Also, it really is impressive how the UI manages to fit all those parameters. I mostly use it for cool synth leads. Here's one of my favorite videos on Operator https://youtu.be/rfeY0_k1ctk?si=s68Lr033cHf34a4M by Robert Henke himself.


It's my goto VA synth too. I'll reach for it first, before Analog or other VSTs.


Yeah, I can confirm that writing windows GUI apps is not at all painful for me. I still use Windows Forms in .NET 4.8 and my executables are < 1mb, Visual Studio's form designer is very easy to use, you can subclass all the .NET UI controls and customize them however you want. There's always been accessibility and even support for high DPI.


>I still use Windows Forms in .NET 4.8 and my executables are < 1mb

Do you need to ship any supporting files separately, along with the app?

And is .NET 4.8 or higher already on Windows PCs?


.NET 4.8 is the last .NET to be bundled with Windows. It's a legacy stack, but it exists on every Windows >= 10 so it is a legacy stack that makes deployables easy (just assume it is installed). (.NET 4.8 is the new VB6.)

With .NET 9 right around the corner, how far behind the legacy stack is only increases.

.NET > 5 will never be installed out of the box on Windows PCs. The trade offs to that concession however are: cross-platform support, better container support, easier side-by-side installs support ("portable" installs). .NET > 7 can do an admirable job AOT compiling single-file applications. For a GUI app you probably aren't going to easily get that single-file < 40MBs yet today, but it's going to be truly self-contained and generally don't need a lot of specific OSes or things installed at the OS level. Each recent version of .NET has been working to improve its single-file publishing and there may be advances to come.


A nice thing about .Net Framework 4.8 is that they finally finished it! No more update treadmill and dicking around dealing with what versions are installed or how to configure your application to use whatever different versions. Just target that and forget about it.


.NET 4.8 is default in Win10/11 now.


Thanks, guys.


I've tried all the popular yerba mate brands, smoked, flavored, Uruguayan, Argentinian, but I still prefer organic unsmoked Yerba Mate with stems. I brew 1/2 cup of mate with 2 cups of 150F water and a splash of lemon juice for 30 minutes, then pour the whole thing through a chemex coffee filter. It takes a few minutes to filter, but the result is a delicious, very caffinated, slightly lemony tea.


This is an excellent suggestion, and it's how I've been using discord since 2019. I will probably never install discord on my phone or desktop again.


good job! this is every bit as good as GPT-3 for programming tasks! I was able to get it to write a C# extension method without any issues.


I own this tv, and I had it connected to the internet for a short while. Not only does it have ads everywhere, but the menu is SLOW. Taking it off the internet not only got rid of the ads, but also made menu navigation faster.


There have a few times that I presented a VB6 prototype along with a price and had my prototype purchased on the spot. Sometimes when a client says "We need this software now", they really mean it.


I don't know if this counts, but I built so much software in Access '97. Mostly for small businesses and individuals. I could build a whole inventory management system in a weekend(a simple one anyways). It was phenomenal. Once I learned Java and SQL(how to correctly use SQL, lol) I quit using it as much. But sometimes I still prototype software in old versions of Access just to model everything out.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: