Story of the Flash Fill Feature in Excel

Taniwha · on Sept 18, 2021

Let me tell you another story about fill in Excel ..... many years ago when the Mac was getting its very first graphics cards I worked for 3rd party hardware manufacturer who had decided to build our first graphics card, in particular 24-bit screen were deathly slow .... people would switch to 8-bit or even 1-bit to use things like Excel and word processors.

I started out intending to design fast vector hardware (because autocad was a thing back then), the end though some serious benchmarking of real world usage meant that we zeroed in on simply making solid fills go really gas (Gbytes/sec, faster than any SGI machine of the time) - one of the big surprises was how fast Excel went with just this hardware speedup - turns out that at times excel would clear a window up to 9 times before ever drawing a useful pixel - all those 'important' black pixels I wanted to make hardware for (character rendering and vectors) were just noise in the traces.

Not just MS: Pagemaker? did something stupid that invalidated the font cache many times a page, Quark? put 2 blank spaces at the end of every line that caused ATM to do stupid N*2 clipping behaviour ..... we could make great hardware and the software guys would just piss it away

leeoniya · on Sept 18, 2021

> we could make great hardware and the software guys would just piss it away

my #1 annoyance with web dev at-large.

you load any web page and your CPU spends 90% of its capacity rendering inefficient, shitty, injected ad code. (speaking as someone who tries to squeeze the most out of modern JS JITs and web platform APIs)

tibbar · on Sept 18, 2021

> I went from searching for the hardest problem I can solve to searching for the simplest problem that will have the most impact.

This is a fantastic framework to pursue when, for instance, joining a new organization. You can quickly have an outsized impact just by looking for common pain points. As a corollary, it’s very useful to spend lots of time using your own product and talking to the users!

agumonkey · on Sept 18, 2021

this is something I'd pay to be addressed in most companies / institutions.

Every tool they buy (physical or digital) is always 60% fit, but adding a few bits on top would propel it above 80% and make everybody's days a bliss.

Most applications I've seen are designed for generic tasks, except that workers have a very regular set of tasks, if software could be tailored a bit to this (prefill stuff, save clicks) you cut fatigue and errors buy a huge figure.

This can have a serious social impact, last gig there were thousands of lawsuits waiting in a hall because nobody wants to type them in since the software requires to rewrite everything from scratch even if 80% of the data is identical.

CraigJPerry · on Sept 18, 2021

This is one advantage of the world switching to webapps over desktop apps. It’s trivial for me to create a chrome extension that enriches or modifies the UI of a bought product.

Add an extra menu of common tasks- no problem. Add a datalist dropdown of options to a textbox where the options come from another service dynicamlly? - again no bother but usability of the vendor product went up substantially.

agumonkey · on Sept 18, 2021

Potentially only, where i work, browsers are managed by IT, no way to patch stuff, no devtools. Also depending on the design, adding your code may never work unless you reverse the whole codebase (the app in place is extremely stateful, all a.href are replaced by procedural js calls).

Don't laugh but I had more results scripting mouse/keyboard automation on 90s oracle GUIs [0] than the app I mentioned above.

[0] I could prefill half the stuff and help the rest with powershell, it gave me 70% time reduction on most tasks

CraigJPerry · on Sept 18, 2021

Ahh that sucks royally. I've been in similar and it's soul sucking!

    <flippant-comment-from-a-random-hn-poster-who-doesnt-know-your-circumstances>
      The job market's great right now
    </flippant-comment-from-a-random-hn-poster-who-doesnt-know-your-circumstances>

:-)

Hopefully more usefully as a comment - browsers are super hackable, you might be able to use a javascript bookmarklet to effect some improvements. Something like this https://caiorss.github.io/bookmarklet-maker/ (there are others) makes life easier for creating these.

agumonkey · on Sept 18, 2021

I don't mind the comment above, he's somehow right that html/js is more open to ad-hoc patches than native code. Except when admins forbid you to.

bookmarklets don't allow for injections into the app so it wont work

pbhjpbhj · on Sept 18, 2021

I'd be interested in seeing implementation examples of adding 3rd party data from one (3rd party) web source to another for form completion, if you have them please?

Presumably this has to hack around cross-site restrictions nowadays (it's a couple of years since I did any website JavaScript).

CraigJPerry · on Sept 18, 2021

Here's one i made just now: https://gist.github.com/craigjperry2/83b81b7eaf5cf68ca042a9a...

1. Clone that repo

2. Run my fake vendor app - in this case an html page with an empty text box - npx http-server

3. Add the browser extension - tested in firefox - then refresh the page

Now you should see the title fields from https://jsonplaceholder.typicode.com/posts in a dropdown that's now been added to the input field of the fake vendor app page.

plorntus · on Sept 18, 2021

Extensions can request a higher level of permisions so you typically wont run into that issue.

jfrunyon · on Sept 18, 2021

This is an advantage of open-source (as in source that you can read and modify, not necessarily FOSS), not web, IMO. Webapps are just necessarily open-source (at least the client-side portion).

CraigJPerry · on Sept 18, 2021

Yeah i suppose, if everyone switched to rendering on a canvas element instead of to the DOM then this advantage would shrink away

peterburkimsher · on Sept 18, 2021

Yes, and it's surprisingly easy to find out where to help! The best colleagues and best teams are where there are a lot of "thank you"s.

The same is true in the article:

"I am indebted in particular to one woman, whose name I will never know". "I also thank the Excel product team for shipping this technology without which this work wouldn’t be as famous."

I wish I knew how to reach out to the author (Sumit Gulwani) or other people in the Excel product team, about a feature that would be really hard to code but really useful to me.

Hierarchical menus of Sheets.

Like bookmarks folders, or Playlist Folders in iTunes, clicking on the Sheet Folder would show every row inside, concatenated into a giant Sheet. Clicking on the Sheet Folder would bring up a menu, which I could then navigate to find a sub-sheet. From a programming perspective I know that it's really hard to do hierarchical nested databases. But from a usability perspective, it would make some of our Excel documents with hundreds of Sheets much easier to navigate.

mshariff · on Sept 18, 2021

Do you mean like an index sheet with links so you can then go to the specific page?

We might have something for you…

peterburkimsher · on Sept 18, 2021

An index sheet would be a good start!

Being able to organise it into a hierarchy, like a file system, with folders and subfolders, is what I'm imagining. Selecting a Sheet Folder would show the concatenated data from all sub-sheets, like iTunes Playlist Folders.

Something like this:

http://peterburk.free.fr/ExcelSheetMenus.png

MrHus · on Sept 18, 2021

If you do not know what flash fill is, like me, this page shows a video of the feature:

https://support.microsoft.com/en-us/office/save-time-with-fl...

duskwuff · on Sept 18, 2021

Alternatively, this video:

https://youtu.be/fRjsK-NXVEA

(Key word: "Maruary".)

sumitgulwani · on Sept 19, 2021

Cool find! The 100-second video segment starting at 10:43 in this link that I am sharing would be more fun: <https://www.youtube.com/watch?v=421gU482xFE&t=643s>

sumitgulwani · on Sept 19, 2021

Here's an illustrative demo of the Flash Fill capability. Check out the 3 minute video segment starting at 4:30. <https://www.youtube.com/watch?v=X1YXge3C8RI&t=270s>

willhinsa · on Sept 18, 2021

Seems like a lot more work than doing Data > Text to Columns :)

II2II · on Sept 18, 2021

As someone who has been handling delimited text for decades, Text to Columns is the most natural approach. That being said, different people have different backgrounds and approach problems from different angles. Flash Fill may be more appropriate in their cases.

sumitgulwani · on Sept 19, 2021

Yes, text to Columns would work well for splitting tasks in a very uniform dataset. However, if you want to extract say Lastname from a column containing names some of which may have middle names (i.e., the dataset is not very uniform), then text to columns would not work correctly.

Also, if you would like to do more than substring extraction, say convert "FirstName LastName" into "f.l." or "lastname, firstname", then Text2Columns won't be sufficient by itself. You would need to do more post-processing or would need a more sophisticated scripting capability.

st0le · on Sept 18, 2021

Fun fact: Powershell 5.1 has a `Convert-String`[1] cmdlet that is a Flash Fill implementation for CLI. You can do these inferences in Command Line.

[1]https://docs.microsoft.com/en-us/powershell/module/microsoft...

CraigJPerry · on Sept 18, 2021

I was hoping to browse the source for that cmdlet on github but it’s missing :-(

copperx · on Sept 18, 2021

The code uses program synthesis, which I imagine is unreadable unless you're versed in that field.

pbhjpbhj · on Sept 18, 2021

I'm a very occasional Excel user, I can never seem to get it to do what I want on real spreadsheets -- it always trounces my number formats, or background colours, or repeats a short-periodic series instead of creating a series over the whole range. Seems like it could use an idiot mode with a pop-up box with options like "don't change any styles" and "make a series over the whole range" that also gives tips like "use $ in from of a reference to prevent it changing over the series".

TuringTest · on Sept 18, 2021

That touches an important point of this kind of "automation magic" in autocompletion tools. They are great when they guess your intent correctly, and terrible when they misfire - specially when, as is common with Microsoft's tools, they are applied automatically and you have to take extra action to reverse them.

It annoys me that they research and apply complex algorithms to analyse the available data, detect the most likely ways to fill the document, build alternative programs to solve the problem and rank which is the most useful; only to then choose one of the inferred options and discard all the rest of the analysis work, getting only a (possibly incorrect) guess of the data completion task.

If, once the work is done, the user were allowed to see what the process was to detect the data, and were allowed to choose between the discarded options or give additional clues as to what the original intention was (the "active-learning session with the user to resolve the ambiguity" that the author of the article discards), the technique could be controlled more precisely and be useful in more situations, not just the ones that work the first time. Giving control to the user would make the technique more robust in my opinion instead of being hit-or-miss.

sumitgulwani · on Sept 19, 2021

I absolutely agree with you! The next version of Flash Fill that we are aspiring to build will not only handle a much larger class of transformations, but will hopefully also show you the code, and suggest you corner cases for inspection and soliciting more examples from the user. Allowing the user to browse through multiple alternatives is also a good idea.

What I meant to say was that such a rich interactivity should not be the default experience for simple and common cases. For simple and common cases, the technology should just work automatically without requiring much user interaction, thereby promoting usability and discoverability. However, you are absolutely right that for more sophisticated cases, instead of letting the user fall off the cliff, we should invest in a rich interactive experience where the AI can partner with the user to help complete more sophisticated tasks.

TuringTest · on Sept 20, 2021

Glad to hear that! Yes, I agree with having good defaults and orienting the program toward finding them first; I understood that this part from the article referred to this.

The thing is I'm a firm believer in putting the user at the centre of decision-making in automated processes, especially in AI tools that combine multiple sources of data without a clear of model of how they arrive to their solutions.

Too often these process provide their result as a foregone conclusion, and leave the user helpless if it is not the right one. By providing hints on how the process has arrived at that outcome, the user can form a mental model of how it works and learn how to use it more efficiently, or for which situations it is not suitable.

ZeroGravitas · on Sept 18, 2021

This is very cool and I was unaware of this feature.

Are there any open source libraries that perform similar tasks? I can think of a few uses for this in data cleanup as long as it can handle the input not being super clean.

sumitgulwani · on Sept 19, 2021

Our program synthesis APIs are available publicly, but for non-commercial use only. <https://github.com/microsoft/prose> You want to look for API samples corresponding to Transformation.Text capability. This is a more powerful capability than Flash Fill. If you happen to try this out, we'd be very interested in getting your feedback on whether this is powerful enough to handle your use cases, and if not, we would love to be inspired by your use cases for a future version of this technology. You may reach us at prose-contact@microsoft.com.

Hjfrf · on Sept 18, 2021

Powershell has the same functionality in Convert-FromString

thristian · on Sept 18, 2021

As I understand it, this technology is heavily patented.

nojito · on Sept 18, 2021

I wonder if this is the same tech behind columns from examples in Power Query?

That and flash fill has saved countless hours to my team.

julienfr112 · on Sept 18, 2021

How does it handle middle name, or composite first name and second name ?

sumitgulwani · on Sept 19, 2021

You would have to give multiple examples, at least one for each "different kind" of name (for which you would have programmed a different handling logic if you were programming the task yourself---the tool needs to see at least an instance of each of those logics---it has no understanding that something is a name; it simply treats the input as a sequence of characters).

thedudeabides5 · on Sept 18, 2021

am I the only one that finds this feature distracting and unhelpful? Most of the times the recommendations are wrong, and then it's just more visual noise/keystrokes.

n8cpdx · on Sept 18, 2021

Do you try giving more examples? I find it usually only takes a few corrections to get it right.

You can turn it off if you don't like it. https://superuser.com/questions/766967/how-can-i-turn-off-fl...

sumitgulwani · on Sept 19, 2021

Thank you for your feedback. I would love to get more insights into your use cases---we can use them as inspiration as we continue to improve the technology in this space. Please feel free to send us your use cases where Flash Fill fails at prose-contact@microsoft.com.

You may consider watching this 5-minute video segment starting at 4:30 to get a sense of the current scope of Flash Fill (i.e., when it is expected to work, and when it isn't): <https://www.youtube.com/watch?v=X1YXge3C8RI&t=270s>

copperx · on Sept 18, 2021

You might be alone in that. Flash Fill saves me from doing so much manual work that I consider it a killer feature of Excel. I will never try another spreadsheet software until they get flash fill functionality (and I believe the feature is encumbered by patents, and hard to replicate anyways unless you are versed in program synthesis).

pophenat · on Sept 18, 2021

It's mostly correct in my experience, but maybe it's because I've learned what it can/can't do. And when it does what it is supposed to do, it can be very useful.

bajsejohannes · on Sept 18, 2021

The paper outside a paywall: https://www.microsoft.com/en-us/research/wp-content/uploads/...

willyg123 · on Sept 18, 2021

> She opened up her laptop, *fired up* Excel...

"fired up" in this context needs a place in the CS Hall of Fame

ExtraE · on Sept 18, 2021

…why?