"Perform OCR on this image. Return only the text found in the image as a single continuous string without any newlines, additional text, or commentary. Separate words with single spaces. For any truncated, partially visible, or occluded text, include only the visible portions without attempting to complete or guess the full text. If no text is present, return empty double quotes."
TL;DR: For original object truth rather than image truth, this paper shows VLMS are superior, even though prompt shows the authors are "holding it wrong".
Yet another paper where the authors don't address what tokens are. It's like publishing Rolling pin fails at math or Calculator fails to turn dough ball into round pizza.
While I can understand where they're coming from in a desire to avoid hallucination when doing some letter for letter transcription from an image, certainly most times you reach for OCR you want the original copy, despite damage to its representation (paper tears, coffee stains, hands in front of it). Turns out token conjunction probability conjectures come in handy here!
Whether the image of an object, or the object, is "Ground Truth" is an exercise left to the user's goal. Almost all use cases would want what was originally written on the object, not its present occlulded [sic] representation.
[Background: the question author (Michael) answered their own question]
Why is this not the accepted answer - waiting for a better one? – piiperi
@piiperi To me, it feels a bit arrogant to accept a self-answered questions before a couple of days have passed. Granted, on the other hand, in the case of this question, a different answer is quite unlikely. – Michael Karcher
A different answer is quite unlikely, but improvements are always possible :). Case in point, another gem from the comments:
I see your NOP (90h), and raise you an explicit DS prefix (3Eh). It's all about the style points! :-) Any instruction with a memory operand can have a segment prefix. In this case, the DS prefix is implicit/implied, but it can be explicitly specified without changing the meaning of the instruction. Both ways work to pad the extra leftover byte of space, but the explicit DS prefix does not change the instruction's execution speed, whereas the NOP actually takes 1 cycle of time to execute (plus possible decoding). – Cody Gray
This comparison looks like bad content marketing at best. "The MSP430 family certainly includes peripheral options that current Green Array chips do not, such as flash memories and timers." => TI's 430 range peripherals is imo one of the primary use-cases.
MSP430's aren't trying to be energy efficient per operation - they are really bad at processing. What they are good at is running complex circuits/peripherals at a low total energy consumption over a long time.
The fact that they're running at 8Mhz is telling... that's far from the "low power" level. As far as I recall, you generally run using internal low-frequency RC oscillator for low power, and then just boot up the high speed crystal oscillator if you really need it.
I can't imagine the utility of low-power processors that "aren't trying to be energy efficient per operation" and are "really bad at processing" - I thought that was the whole point!
These really are completely different type of computers - I believe if you can apply the MSP430 successfully to your application then the GA144 is probably the wrong chip to use.
But what if you need real-time nano-second reaction times on many separate pins? What if you need to process a 30 Mhz signal? While controlling a display and accepting input? All at the same time?
Then you might need the GA144, which can do all those things at the same time without needing to worry about interrupts or waking up from low power sleep modes or any of the other complex mechanism computers employ to minimize power loss.
I can't imagine the utility of low-power processors that
"aren't trying to be energy efficient per operation"
Consider the Amazon Dash Button.
10 seconds a week running a WiFi radio, TCP/IP, SSL and all that. 604,790 seconds a week waiting for a button to be pressed. Battery powered.
If you can monitor a button on 1 microamp, and run WiFi on 60 milliamps, 50% of your battery capacity will go on sleeping and 50% on waking.
And wake-state power consumption is dominated by the radio module, so the best way to cut down on wake state power consumption is to make the wake as short as possible.
That's the other side of low power devices, and part of the the beauty of asynchronous logic, it does nothing better then anything else! Computers like the ga144 'sleep' mid instruction waiting for a pin (button) to change, consuming only gate leakage for as long as needed.
I think that malanj's point is that the article is disingenuous because it compares the Green Array chip to an MSP430 product under a set of operational conditions specifically designed to maximize the MSP430 power consumption while excluding the many advantages of the MSP430 has in features. The summary of the whitepaper even start with the line, "there are apples and oranges in this comparison."
I'd recommend finding 5 more people with a problem like that, who're all prepared to sign a Letter of Intent (LOI) that they'll pay you $200/month for it. It's often easy to find a single person like that, but if you have 5, you might be on to something.
Craziest thing - it works! Just detected a hotdog (off a photo). Machine learning has really come far, that this can be done for a joke app is really cool.
My understanding is that if you trade through a platform like this, that's monetizing through order flow, you're getting screwed by HFTs. I.e. AFAIK Robinhood doesn't need to be selling the data to HFTs (which they don't do), for you to get screwed by them.
Anyone here in the space who can quantify the "hidden" cost you incur because of more sophisticated traders trading against you?
It feels like working with a broker who's explicitly charging you might actually be cheaper if you take this "cost" into account?
All the major retail brokers sell their order flow, so you generally eat this cost no matter who you use. In addition, the hidden cost is fractions of a penny on the dollar (so still better than the $5-7 a regular broker charges per trade anyways).
What's the long play then? Surely Robin Hood isn't absorbing the the transaction fees as a loss leader just to increase it's user base? If that's the play, any other company can emulate that.. The incentive seems to be this may cause a price war with existing brokerages.
The long play is that it actually doesn't cost significant money per trade at an institutional level. The only reason the other major brokers charge you money is because it covers their brick and mortar retail costs (e.g. Scottrade) and it adds to the bottomline (e.g. Etrade). So Robin Hood is actually just giving this surplus to the consumer instead of keeping it for themselves, and building a user base / brand in the process. Eventually once they have enough users, they will start to up sell you for other services (e.g. checking/savings, margin trading, options trading, check writing etc).
So majority of the brokers have brick and mortar operations that can't afford to compete on price. That leaves only the online brokers. Then the next step is to build a brand that attracts customer trust/loyalty, which is what Robinhood seems to be doing and to build a really nice experience which they do with their slick apps. Once customers (generally young ppl who are first time investors) are invested in the Robinhood platform, then Robinhood can up sell them on additional features/services (margin, options, API access etc). So sure Etrade could compete on price (I doubt they would want to since it would immediately hit the bottom line vs a new company not depending on broker fees), but if you look at it from a long term perspective eliminating broker fees is just step 1.
The long play seems to be deliver retail stock trading at the lowest possible price. There's a lot of money to be made even if they don't try to maximize profits.
Plus, I bet they are banking on the existing brokerages not responding until things are too late for them. Most brokerages offer free trades and other perks, but limited to "high-value" customers. They won't drop trading fees until they absolutely have to, but by that time, their low-value customers will all be gone.
There aren't actually any transactions fees to absorb. Wholesale brokers (think Citadel) pay retail brokers (think Robin Hood) for the privilege of executing their orders.
Because, for the most part, retail investors are not well-informed investors and are willing to cross the bid-ask spread. To simplify dramatically, suppose Citadel thinks the fair value of a stock is 25.185, the market is 25.18-25.19, and a Robin Hood customer wants to pay 25.19 to buy 1 share of the stock. If they buy from Citadel, Citadel makes $0.0050 on average. Given those basic economics, it makes sense for Citadel to pay, say $0.0018, to make sure that the Robin Hood customer buys from them instead of from Knight.
Oh damn, then it's just opening short/long term trading to a new segment of people that wouldn't otherwise do it. That's like 100% casino/poker bonuses during the poker boom.
Day trading is hazardous to your health and your wealth and more-so with the increase in HFTs. But the buy and hold for long-term strategy works very well. Use limit orders and you won't be paying any "hidden" costs.
For example, Lets say you put in a Market Order for 1 share of GOOG. Currently the price per share is about $900. It could fill immediately at $900, or it could fill immediately at $930 with that $30 premium. If instead, you place a Limit Order for 1 share of GOOG for $900.02, you will only buy 1 share of GOOG for $900.02 or less which effectively cuts out any "hidden" costs. Limit orders are almost always better.
Fun fact: All Robinhood market orders are actually placed as limit orders 5% above the market price. This is to limit investor risk.
Edit: Now only market buy orders are converted into limit orders. Market sell orders remain as market sell orders. They must have changed their policy in this last year.
Although that article reads like an ad for Interactive Brokers, it offers a bit of insight on the subject.
Namely how aggressive limit orders in an order flow situation will be less likely to fill in the face of desirable price movement, thus incurring an opportunity cost.
You'll actually get better fills from HFTs operating on purchased order flow than you would if you went directly to market. Also, there's nowhere you can go that won't sell your order flow on undirected orders. But if you don't like it, you can always send it directly to a particular exchange, regardless of who your broker is.
Here's a specific anecdote of how I did it in Palo Alto (after just arriving in the US from South Africa):
The first day I was in Palo Alto (and the US), I had absolutely no contacts and was severely jet lagged. I had just moved to the US to establish my startup (https://journeyapps.com) in the US, raise "Silicon Valley VC" and chase the dream ;) tl;dr - JourneyApps is a platform for businesses to quickly developer mobile apps for internal use.
I walked down University Avenue, and spotted Palo Alto bicycles. I walked in (very nervous) and asked one of the sales people if the manager is in. Jeff (the manager), was there and asked what I wanted. I explained I'd just moved here, and was working on a startup that eliminates paper forms.
He was kind enough to not kick me out, and (because it was closing time), spent some time talking to me about how they sell bicycles and which paper forms he uses. He also explained how much of a pain it is.
I kept delving into the details of his business, which he absolutely loves, so he was keen to keep talking. After forming a good idea of what his world looks like, I asked if he'd be keen to do an experiment with us. We'd make an app that does bicycle sales on a tablet, and bring it to him in a day or two. The experiment would be free, he just needs to tell us what works and what doesn't.
He was really keen, and gave me copies of the forms he uses. Overnight we built an app on our platform that acts like his paper forms. The next day we rolled out in his store, and waited for bicycle sales.
The app worked, and we learnt a heck of a lot about US business culture, even though it was just a "small family owned" bicycle store.
Eventually we raised the mythical Silicon Valley VC money and got our first Fortune 100 customers, but the process stayed remarkably similar:
1) Find someone who's passionate about their business
2) Talk to them with genuine interest and learn about their world
3) Be upfront and open about which problems you think you can help with, and which not
I watched the Emerson video; very neat. Is JourneyApps something that I'd be able to utilize as a third party to create solutions for clients I've had previously and future opportunities?
Or, do you work one-on-one with all of your customers and develop only in-house?
That'll actually be one of the key use-cases for Root I think.
E.g. We've had people who want a card for their spouse, and then whenever one of them spends money, it alerts the other one so that they can sync budgets.
"Perform OCR on this image. Return only the text found in the image as a single continuous string without any newlines, additional text, or commentary. Separate words with single spaces. For any truncated, partially visible, or occluded text, include only the visible portions without attempting to complete or guess the full text. If no text is present, return empty double quotes."
Found in: https://github.com/video-db/ocr-benchmark/blob/main/prompts....