Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
With a new tool, spreadsheet users can construct custom database interfaces (news.mit.edu)
89 points by renafowler on July 24, 2016 | hide | past | favorite | 43 comments


OK... but where are decent user-friendly tools to manipulate, manage and share data and insights? Really, the problem is that to this day data in Corporations is a mess. The moment you step out of the standard warehouse systems you find yourself navigating a mess of spreadsheets, word docs, power points, all showing sometimes the same data in different formats. You duplicate data every time you send an Excel file by email... this data needs to be in sync and when not people get screamed at or fired. That means a lot of work goes into manually maintaining thousands of different files to make sure they tie to the overall picture. It's a mess. The bad part is that nobody is looking into this, everybody is focusing on analytics or reporting, which by now are the easy part.

Then you have ancient tools like HFM or BPC that try to "enhance" spreadsheets and spreadsheets consolidations. They don't work. They are barely reliable and they cost a lot of money for no reason (they are simple SQL databases, they can't even be compared to much more complex software like salesforce or similar).

Then Corporations are now being sold this all "big data" thing, which is old by now for tech audiences, but it's just new for most of the big companies nowadays. Unfortunately, while big data has a lot of potential, it further moves away investments on good old small data where probably there is much more ROI to grab simply because solving this problem is not that difficult and not that expensive if you want to actually fix it.

Then now everybody wants to do Predictive, but Predictive won't beat saving hundreds of employees thousands of working hours by improving the efficiency of how we handle data and data insights. You can literally create a new workforce out of the many hours you would save with better efficiency in this area. Without even considering that on top of efficiency, you get much better, more fact based, decisions driven by the overall increase in transparency which you lose when insights are spread around thousands and thousands of files.

There is some light at the end of the tunnel. Software solutions like Tableau go in the right direction, but they do not provide the much (more) needed tools to properly manipulate and manage data and, especially, consolidations and integration. The only way to get out of this mess is to control the data flow but especially to have 1 and only 1 data flow. That means once your insights are approved and locked, every other view should read this data, there should be no manual intervention any more. If the locked data is wrong, it will be wrong for everybody, which is a good thing compared to having multiple copies of the same and then people fighting on which one is the right one.


> You duplicate data every time you send an Excel file by email

As much as I hate Sharepoint, it's actually, as of this year, come close to solving this problem. Full fledged Excel (and Word) documents synced between many authors working simultaneously (section-wise edit locks until next explicit save/sync), no one ever emails anything around. It's not all the way to a Git-like solution, and (at least for the moment) tied to IE on Windows, but compared to the old ways, it's a winner.


It looks like what we're doing is similar to what you're looking for -- a user-friendly tool to manipulate and share data and insights [1]. We're targeting Tableau users, btw.

[1] http://easymorph.com


EasyMorph looks nice! How does it compare to Alteryx (http://www.alteryx.com/)?


You're spot on. I can tell you work in Enterprise as well and I feel your pain brother/sister.

1. Everyone is willing to sell you something that is going to manage all of your disparate legacy data, except that somehow nothing ever seems to die in the Enterprise and instead you end up with users of the unmaintainable legacy system AND a second system on top (and third, and fourth) that now also all need to be maintained.

> Analysis and reporting are the easy part.

I don't agree on this point. They're just as hard as ever. Of course you have Excel, Reporting Services, and Power BI. But everyone wants you to formulate the QUESTIONS as well as the ANSWERS. As soon as they discover all of the issues you've found, now it's bury-your-head-in-the-sand-time, because they just want to whittle it down to that top 1% of 1% of problems so that their stats look good; whack it into a daily email, which goes onto a daily KPI sheet, and the underlying issues they originally wanted to know about of course never (or rarely) get fixed. Progress!

2. Let me tell you about Big Data! It's a codeword for:

* We know you've never archived or thrown anything away before, that you have data coming out of your ass.

* We also know you'll never use it. You don't have the funds, inclination, or staff to use it. But one day we're going to build something that will magically do it all for you. So you better keep holding onto it.

* Oh yeah and we have something to sell you to help you keep holding onto it all until that magical day arrives! Please send us a purchase order. Thanks.

I love data. I am sure that some companies like energy companies and telecoms do actually have to deal with massive amounts of data. But Big Data has become a management buzzword that doesn't mean much except retaining everything and spending money.

If you retain everything you never have to make a decision about what to destroy. Management loathe having to make a decision to destroy anything that they might be held liable for, but also love spending money on something that they can use to justify that they're solving a problem.

Thus the success of Big Data.

3. Predictive. I can't comment. Everyone talks about it but I don't see the solutions in the real world. Most of us are stuck in Excel / SQL and just starting to dip our toes into R. When everything else in life is hard, predictive becomes just another harder thing to avoid.

> There is some light at the end of the tunnel.

What you're talking about is Master Data. Of course the well for this has already been poisoned long ago and woe to all who sip from it.

Everyone loves starting an MD project that siphons data in. But the second people have to manage the data inside and start cleansing it and making decisions - it falls in a heap. It rarely gets to the stage where data is then written back to source systems it's game over (LOLOLOL many a vendor I have left in the dust just asking them about this - "That's phase 2" they say - and they've never had any single customer reach it).

--

Back on point about the article though, it looks like an interesting way to structure data and it could probably do as an Excel add-on of some kind. I expected something that would create a database but unless there's more than the video this is just writing queries?

This is all well and fine, I guess, if it adds usability. But nobody's jobs are going anywhere. The article says this tool can create anything SQL-92 can. Good luck... I deal with shit daily that you couldn't even begin to imagine.


Every time somebody mentions "Excel" and "add-on" in the same phrase I then have trouble sleeping at night. I have never seen a working add-on in Excel, it's all crap, nasty too. Also, the complexity is not in the creation of the database, but in the syncing and control of all the changes people do to that data. In programmers terms, think if everybody would download the source code, do changes, but never upload those changes to a master, but then fighting each other on who has the right "code". We need a git hub of data, but for people that don't even understand what data really is.


Yup completely agree. I was hired to do that. Dive in their data, old and new, in my current company. I got insight. I got valuable answers. And when i brought them up they said "uh yeah but we can't change it. It is a political decision. You don't have some QuickWin to show that you are useful ?"

I found things to do... but it is not the core of the problem. I am lucky my boss is quite high on the hierarchy and may be convinced to attack that core problem. But it will take a looooong time.


Core problems take months or years, depending on who gets involved. But low hanging fruits are everywhere and you should exploit them to gain credit and credibility. Just don't implement solutions that fix a small issue but then introduce even more technical debt (i.e.: coding a custom access database for non technical users).


thing is, most of the time low hanging fruits are the only thing they solved during the past 10 years. They are now stucked with core problems.

And no, most of them would not take long time. They just need a political decision, but could be solved in a couple days.


We had something amazing in this space but it lacked funding. If we don't draw a correlation between funding harder things and getting better innovation, we'll have the same complaint in 20 years again.


Any links?


http://www.fieldbook.com

I haven't used this in production, but messed around with it for a while. Seems to do what this article is suggesting.


Yeah, I reposted that one in a few places. Reason is getting people outta spreadsheets into better DB will need replacement to be just like spreadsheets. Anything too different in UI or workflow will get resistance and workarounds. Fieldbook was clever.


Very cool. The right side of the article provides links to a video presentation of the tool and a SIGMOD paper reporting on the details. It is specifically compared against Access in a study on usability.

Link to the project page: http://people.csail.mit.edu/ebakke/sieuferd/


This sounds a lot like AirTable[1] to me. AirTable has quickly become my favorite Spreadsheets replacement because it has a nice REST API, ways of linking data from multiple tables, data integrity principles, and pretty solid mobile phone data entry, while still maintaining the simplicity of a spreadsheet.

[1] https://airtable.com/


I do like Airtable, and i can see the comparison but as far as i understand you cant do the complex querying that the article is suggesting?


Am i missing something but... ... Where is the tool ?

Source code or it doesn't exist ;)


"when you have something extremely industry-specific is, you have to hire a programmer who spends about a year of work to build a user interface for your particular domain" — with tools like Oracle APEX it can be done much faster, so I think nothing new.


As long as one needs a programmer the year is a good rough estimate, the hard part is not the building but the gathering of requirements. Take the programmer out of the equation and thus the communication overhead between programmer and domain specialist and you save a lot of development time.


The thing is that having a tool like this does not make the requirement gathering go away.

Dude I can build a house. I've only used a drill once in my life, I've never mixed concrete, or bricks, but I can. I will lay stuff on top of each other and use superglue and whatever it takes.

But nobody will ever want to live in and it would not be safe. That's why you get a professional to do it.

A user might somehow be able to magically skip gathering requirements and just dive gung ho into creating a structure for what they want. Great. And you end up with the legacy Excel monstrosity turds that we've been trying to eradicate from the Enterprise for the past two dozen years.


You still need someone to think out relations in your database and how to retain it faster with large data sets. If this work can be done by domain specialist only, then it can be done the same way using APEX, FileMaker etc. Without live example of the new technology it is difficult to say how it differs.


I'm not sure how this is any easier than using Tableau, which can construct most of these types of business analytic queries fairly easily. More complicated queries that pull in data from multiple tables usually requires an understanding of the organization of data across these tables, but that is true of this tool as well (at least from the video on the author's website - http://people.csail.mit.edu/ebakke/sieuferd/index.html)


So... Pivot Tables? Excel has done this for ages. Excel 2016 even has in-memory BI.


Access had a database gui wizard since when? 97?


Microsoft Access?


Screenshots[1] look more like Power Pivot.

1. http://people.csail.mit.edu/ebakke/sieuferd/



So... they've reinvented Access?


In my experience (which includes large companies from sectors like automotive and utilities) in most cases there is a large, RDBMS backed application doing something really critical... And then when you talk with actual users you discover a swarm of Excel files that either support or often second guess the corporate system.

And it is not just reporting. In some cases inputting large quantities of tabular data is just done by writing Excel macros that create CSV files to upload to the system (and sometimes, horrifingly, the data are just retyped...).


It's a scenario I've witnessed a few times too many now. Often an unsupported feature X is becoming critical to the team, so users reproduce large chunks of the corporate system in a separate spreadsheet in order to support the feature themselves, then build upon that.

Without the ability to react quickly that smaller businesses have, I noticed that some form of resistance from IT is usually a big factor: lack of resources, divergent priorities or visions, unclear benefits from hypothetical feature, limited architecture, end-of-life maintenance states that end up lasting years, etc.

Reality changes, systems fall behind, users adapt and spreadsheets are born.

One root cause is that it's comparatively hard to add new data in a RDBMS - it usually requires a full development life cycle. On the other hand, spreadsheets are free, provide immediate benefits, allow users to define and refine requirements as they go, and can be abandoned at the drop of a hat (low risk). Quite the conundrum!

From the article:

> At present, Bakke’s tool enables query construction on an existing database, but it doesn’t enable the direct entry or modification of data. He expects to begin adding that functionality over the next six months

Querying data is a problem with many good solutions nowadays, the bigger question is where and how to write that new data the users want to play with (other than in a spreadsheet).


>it's comparatively hard to add new data in a RDBMS I assume you meant to day "to add new schema". Adding new data is easy and in most enterprise environments is happening automatically every day.

As far as "add new schema" goes, I'll claim that the database is just another software component, and if you pursuing an agile strategy, that you should be modifying that schema with every sprint. My client, for whom I do data warehouse consulting, updates the schema every two weeks. Because we are responsive to the business, they never revert to spreadsheet solutions. Of course they are free to create there own custom reports in spreadsheets - we'd never try to take that away. In fact we make it as easy as possible to run business curated queries and dump the data to Excel. Our newest feature for these power users is to allow them to give back to us the Excel reports they built and we will "push and deliver" - put the current data into the raw data sheets and deliver the report - usually to Sharepoint.

Getting good business software solutions in place merely requires that the business hire some good developers. Why more don't do so remains a mystery. My best guess, like all things "corporate" is fear and laziness.


Because we are responsive to the business, they never revert to spreadsheet solutions.

Not that I do not believe you but... How hard have you looked? In my experience the largest the corporation the more chances there are that IT just does not realize what the situation is. "Shadow governance by spreadsheet" is something you notice only if you actually work in the same office withe business users for a few days, in my experience (or when the whole thing has grown so much that they cannot really manage it anymore then ask you to fold it back into the main system).

Of course, YMMV.


We work for the business, not for IT. That's a key difference. And that's what I mean by "business hire some good developers". As a developer, I find it much more satisfying.


So why can't they use something like Irmin? https://github.com/mirage/irmin


I can relate to that - we used to have two related systems which couldn't talk to each other. So the solution was to print out (landscape, tiny font) 1000's of lines from one system, then sit with a ruler and highlighter pen re-typing it into the other. Day in, day out.

A newer system which replaced some of the functionality and cost many millions of dollars will only export as .xls and only import from .csv (no API offered). Better than typing it in by hand, but it still pains me every day :-/


Glad (and at the same time sad) to see that plenty of others can relate to my experience.


> The tool’s scores put it at the 52nd percentile in the category of business software, which isn’t bad for an academic research project. But the scores for Microsoft’s Access database program are much worse — around the sixth percentile

It is apparently more useable than Access, which is damning by faint praise.


seems to be more geared towards reporting, so reinvented 'oracle forms' probably.


Weird comment as oracle reports as their reporting tool. Forums was great for crud only business applications, actually the majority of business applications but it went downhill when it moved from client server to Web using applets... Apex is a good replacement.


Or Filemaker, though the new versions seem more complex.


or Query By Example (1975). But I'm sure much more.

https://en.wikipedia.org/wiki/Query_by_Example


If someone actually did reinvent Access without the concurrency data-loss problems (but still as a single file per database), they'd probably make a lot of money.


really cool, easy start to any project




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: