Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Presumably this is for data that you want graphs of but doesn't fit in Excel?


So let's say, hypothetically, you run an online store and you want to do some analytics over a proportion of your customer base.

The Amazon solution here, with the stuff they've launched today is;

a) use Kinesis Firehose to take a stream of log events from your app and dump those into S3 (gives you decent-reliability replicated backups)

b) ETL those using COPY into Amazon Redshift (a column store; very fast full-table scans)

c) point QuickShot at Redshift to draw graphs

so, yeah – but most of the data you want likely doesn't fit in Excel (or rather most of the work is reducing the data in size until it could).


And as I understand BI, also for when you want complex graphs that can't be easily modeled in Excel.

With BI solutions you can perform "slices" as they call it, of multi-dimensional data (cubes), and then represent that as graphs that can also be used for drill-down on one or more dimensions of said data.

When you have say 17 dimensions, these solutions are easier to use than using excel to try to do the same.

I have only implemented very simple BI solutions a couple of times, so anyone with more experience can correct me if I'm wrong.


It's all just pivot tables, in the end. Excel solved most data slicing/drilldown problems years ago. Only issue has always been data size limits and access to live, up to data data (though Excel's also been able to hit external OLAP services for data for a long time too).


I'd like to affirm your claim that Pivot Tables and Excel can match pretty much any BI related inquiry where data is available. Caveat - the data will fit in Excel

Source: Experience in Wall Street, Fortune 500 risk management, and dumping stuff from SalesForce to make it useable beyond what our implementation would report (or what leadership could get it to do).


And the data size limit is excel is stifling. 1m maximum rows? That can get filled very quickly.


With Power Pivot it's quite a bit better, this guy did 122 million rows:

http://www.microsofttrends.com/2014/02/09/how-much-data-can-...

If that's not enough you can do SSAS Tabular + DirectQuery (although to be fair, that's no longer "pure Excel", but an end user likely doesn't care):

https://msdn.microsoft.com/en-us/library/hh230898.aspx


Excel gets a bad rap. You can do a lot in Excel including slicing multidimensional data and pivot tables. You can also pull data externally and connect to external data pipelines.

Excel is point and click and allows for a wide rang of programming skill. There are also a lot of plugins and of course you can work in offline mode. It is also easy to integrate old Excel data etc... And you don't need a full time programmer who knows javascript etc or some other language to produce something.


When you're working with large companies that process enormous volumes of transactions you can't just download things to excel.


You would be shocked and amazed by the kinds of things that Excel is routinely (mis)-used for in business.


That doesn't stop people from trying. I'm sure some manager in some company is angry at his IT staff because he is trying to load 10GB of statistics into Excel and his staff is telling him that he shouldn't do that.


Surely if you're that large then you could invest in some in-house domain-specific solution. There are whole programming languages dedicated to doing statistics on datasets.


Isn't that the whole point of AWS and their competitors? The savings of passing the problem off to domain experts to work out all the details often outweighs the benefits of having a solution that is 100% customized for your business.


For many companies (even large companies) it is cheaper and faster to use a service like this than build an in-house solution.


Not to mention that if you want this built for your company you call up the BI provider and tell them to send some consultants over to build it for you. You might train one person or possibly two at the most to be able to make tech calls with the BI provider. Much easier for a company to say "I want a dashboard that reports X, Y, and Z" and then the BI company scurries off, builds it, brings it back to you, and you hand over some cash for their effort.


Very true. Even with smaller companies, data sets like events inside an app are too much for Excel. Ever since I picked up R, I haven't touched Excel.


Hey, what's the difference between R and a library for Haskell (or even Python)? It's interesting that a language that is limited to a particular use (statistical analysis) has taken off so much instead of a library for a general purpose language.


You helped answer the questions yourself. The fact that it was built for statistical analysis means that the syntax and overall flow is REALLY easy to understand and manipulate data with. I imagine someone who is familiar with Python would gravitate towards Pandas or another library, but for the BI that I do every day, R is perfect.

People love saying that BI/Data Science is 80% cleaning data (which may or may not be true), but I've found R to be the best for cleaning up 100k+ rows at a time.


Since 2010, Excel has had a native add-in called Power Pivot which can handle 100M+ row fact tables and provides a full dimensional modeling experience.

"Too big for Excel" is, quite literally, a problem of the last decade.

Edit: typo small != big




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: