There are "tall" applications and "wide" applications. Almost all advice you ever read about database design and optimization is for "tall" applications. Basically, it means that your application is only doing one single thing, and everything else is in service of that. Most of the big tech companies you can think of are tall. They have only a handful of really critical, driving concepts in their data model.
Facebook really only has people, posts, and ads.
Netflix really only has accounts and shows.
Amazon (the product) really only has sellers, buyers, and products, with maybe a couple more behind the scene for logistics.
The reason for this is because tall applications are easy. Much, much easier than wide applications, which are often called "enterprise". Enterprise software is bad because it's hard. This is where the most unexplored territory is. This is where untold riches lie. The existing players in this space are abysmally bad at it (Oracle, etc.). You will be too, if you enter it with a tall mindset.
Advice like "never user joins" and "design around a single table" makes a lot of sense for tall applications. It's awful, terrible, very bad, no-good advice for wide applications. You see this occasionally when these very tall companies attempt to do literally anything other than their core competency: they fail miserably, because they're staffed with people who hold sacrosanct this kind of advice that does not translate to the vast space of "wide" applications. Just realize that: your advice is for companies doing easy things who are already successful and have run out of low-hanging fruit. Even tall applications that aren't yet victims of their own success do not need to think about butchering their data model in service of performance. Only those who are already vastly successful and are trying to squeeze out the last juices of performance. But those are the people who least need advice. This kind of tall-centered advice, justified with "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people who set off to do something more interesting than serve ads to billions of people.
There's optimizations and metrics collected and packages transition between all these layers. There's hundreds of "neat projects" running to special case different things; all them useful but adding complexity.
For example ordering prescriptions off Amazon pharmacy needs effectively its own website and permissions and integrations. Probably distinct sorting machines with supporting databases for them. Do you need to log repairs on those machines? Probably another table schema.
You want to normalize international addresses? And fine tune based on delivery status and logs and customer complaints and map data? Believe it not like 20 more tables. Oh this country has no clear addresses? Need to send experienced drivers to areas they already know. Need to track that in more tables.
I apologize up front if I completely misunderstand your intent. However ...
> Amazon (the product) really only has sellers, buyers, and products, with maybe a couple more behind the scene for logistics.
Is a comically bad hot take that is so entirely divorced from reality. A full decade ago the item catalog (eg ASINs or items to purchase) alone had closer to 1,000 different subsystems/components/RPCs etc for a single query. I think you'd have to go back to circa 2000 before it could be optimistically described as a couple of databases for the item catalog.
DylanDmitri sibling comment is a hell of a lot closer to the truth, and I'd hazard is still orders of magnitude underestimating what it takes to go from viewing an item detail page to completing checkout, let alone picking or delivery. Theres a reason the service map diagram, again circa 2010, was called "the deathstar."
> "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people
This part I completely agree with. And many individual components in those giant systems are dead simple. I dare say the best ones are simplistic even.
Ex-Amazonian here, and while I agree with the facts you present, I do think the "tall" vs "wide" debate is being misapplied here.
Amazon is extremely and perversely obsessed with, and good at, building decoupled systems at scale, which in essence means lots and lots of individual separate "tall" systems, instead of monolithic "wide" systems.
So IMO, Amazon subscribes to a "forest-of-'tall'-services" philosophy. And even at that meta level, I would say the forests are better off when they grow taller, rather than wider.
This kind of tall-centered advice, justified with "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people
The world runs on success stories, not on technology. I wish “wide” thinking was default, for both un-delusion and better development in this area. But everyone is amazed with facebook (not the site, just money), so they have to imitate it, like those tribes who build jets out of wood.
I agree with the characterization of applications you've laid out and think everyone should consider whether they're working on a "tall" (most users use a narrow band of functionality) or a "wide" (most users use a mostly non-overlapping band of functionality) application.
I also agree with your take that tall applications are generally easier to build engineering-wise.
Where I disagree is that I think in general wide applications are failures in product design, even if profitable for a period of time. I've worked on a ton of wide applications, and each of them eventually became loathed by users and really hard to design features for. I think my advice would be to strive to build a tall application for as long as you can muster, because it means you understand your customers' problems better than anyone else.
> I've worked on a ton of wide applications, and each of them eventually became loathed by users and really hard to design features for.
Yes, I agree that this is the fate of most. But I refuse to believe it's inevitable; rather, I think it comes from systemic flaws in our design thinking. Most of what we learn in a college database course, most of what we read online, most all ideas in this space, transfer poorly to "wide" design. People don't realize this because those approaches do work well for tall applications, and because they're regarded religiously. This is why I call them so much harder.
> Yes, I agree that this is the fate of most. But I refuse to believe it's inevitable
Yes exactly. It is not inevitable, I’ve worked on several “enterprise” software suits that did not suffer from this problem. However! They all had that period in their history where they did, and this is why:
Early on in a companies history there will be a number of “big” customers from whom most of the revenue is coming. To keep those customers and money flowing, often bespoke features are added for these customers and these accumulate over time. This is equivalent in character to maintaining several forks of an OSS project. Long term no forward progress can be made due to all time ending up in maintenance.
The solution to this sorry state is to transition to an “all features must be general for the product” and ruthlessly enforce this. That will also mean freezing customer specific “branches” and there will be a temporary hit to revenue. Customers need to be conditioned to the “no bespoke features” and they need to be sold on the long term benefits and be brought along for the ride.
This then enables massive scaling benefits, and the end of all your time in maintenance.
Thanks I think this is a really interesting way to look at things.
What is the market for "wide" applications though? It seems like any particular business can only really support one or two of them, for some that will be SAP and for others it might be Salesforce (if they don't need much ERP), or (as you mentioned) some giant semi homebrewed Oracle thing.
Usually there is a legacy system which is failing but still runs the business, and a "next gen" system which is not ready yet (and might never be, because it only supports a small number of use cases from the old software and even with an army of BAs it's difficult to spec out all the things the old software is actually doing with any accuracy).
I think you're getting the idea -- both your points kinda highlight that this is something that companies want, but are not really getting.
As for the market, various sources have the "enterprise software market", whatever that means, at somewhere around $100 billion to $300 billion. We also see companies trying over and over to do this kind of thing. The demand is clearly there.
Certainly the mandate "help run the business" is a wide concern, and that's an OK working definition of "enterprise", and what most existing solutions are trying to do. There are hundreds of interconnected concerns, lots of things to coordinate, etc.
There are other wide concerns, though. Almost anything in engineering and science. Take, for example, the question "how can we reduce our greenhouse gas emissions?" which a lot of companies are asking (or being forced to ask). If you wanted to build a SAAS product for helping companies reduce their GHG, you've got a wide problem, because there are a thousand activities that can emit GHG, and any given company is going to be doing dozens of them at once. But each company is different. Each state and country thinks of things differently. You might not even have the same calculations state-to-state.
Hard problems in science and engineering are just naturally cross-disciplinary, meaning your system has to know a lot of things about a lot of subjects. There are just thousands of little complicating differences and factors. If you're trying to solve a problem like this, absolutely do not de-normalize your database.
I miss notes - it was really a better way to organize companies than anything later. Historical valuable data, records of why decisions were made, ephemeral email like things but for groups, user programmable if it didn't quite match your needs, robust encryption, it had it all.
oh I always nust assumed Lotus Notes was just lesser Outlook. can you give examples - such has how did it capture why decisions were made - that sounds ... hard or just "someone wrote it down"
It was a low/nocode environment; anyone (with enough rights) could knock up a simple app with rules/workflows and share it with the company. It made collecting, distributing and organising information easy if you knew what you were doing. It also created complex monsters as it was both too easy and too hard to use. I liked it a lot; we moved from Notes to Exchange and Sharepoint back in the day and it was awful for effiency. We required so much more people to do the same things. Luckily I left shortly after.
For your company you have a lot of smart people other than coders. And Notes had a rich collaborative set of intrinsically that you could hip out work flow applications like an accountant with spreadsheets. And built in security and auditing and all that. And since you had the ability to craft tools to fit the exact situation, automation of processes went so fast and was done by people familiar with the business side of the process. We did have a Notes team that would do apps for teams that couldn’t but also had a rich ecosystem of business line apps that were so much better than spreadsheet apps or Access apps.
Facebook was done by Zuckerberg without any specialized knowledge and people would like to go that route because it seems easier, making Twitter/FB/Instagram clone you don't really have to know anything about insurances or handling industrial waste. Then it is basically people joining based on other people
Nowadays there are bunch of regulations on handling user data that one cannot do without knowing but when these companies started that was not an issue.
My point is market for "wide" applications is huge but it is much more fragmented. Of Course SAP and Salesforce are taking cut in that by having "one app for everything"
To get contracts you have to have specialized knowledge in specific area that your SaaS app would provide more value than configuring some crappy version in SAP. So you cannot just make an app in your basement and watch people sign up, but you have to spend a lot of leg work getting customers. That is why it is not really "hot" area for startups, because there is a lot of good money there but not unicorn money and most likely you won't be able to have 2 or 3 different specialist niche products so you could diversify investment but you would have to commit to a niche which makes it also not really interesting for a lot of entrepreneurs who most likely would lie to jump to something more profitable when possible.
> What is the market for "wide" applications though?
Just my experience, but essentially these target industries, not necessarily consumers or singular entities. Hence the term "enterprise". As someone who worked on a fairly reasonable ERP for academic purposes, even just calculating a GPA is extremely complicated in the backend:
* There are multiple schemes for calculating GPAs
* Each scheme needs to support multiple grading types (A-F, pass/fail, etc)
* Each scheme needs to support multiple rounding rules
* Displays of GPAs will need to be scaled properly based on the output context
* GPA values will need to be normalized for use in calculations in other parts of the system
* State legislatures mandate state-specific usages of GPAs which must be honored for legal compliance
* All GPA calculations must have historical context in case the rules changes so that old transcripts can be revived correctly
* Institutions themselves will have custom rules (maybe across schools or departments) for calculations which must be incorporated into everything else
* This pretty much has to work every time
I don't know exactly how many tables GPAs themselves took, but overall the system was over 4,000 tables and 10,000+
stored procedures/functions. Also, I worked in the State of Texas which has its own institution-supported entity performing customizations to this ERP for multiple universities that are installed separately but required for full compliant operation.
I would compare this to most modern "tall" applications which would more-than-likely offer you maybe up to 3 different GPA options with some basic data syncing or something. They might offer multiple rounding types if they thought that far. These apps are generally extremely niche and typically work for very basic workloads. They can capture a lot of easy value for entry-level stuff but immediately fail at everything else.
Your initial premise is flawed though. For example, as someone who worked on Facebook's database team, I can tell you that Facebook has thousands upon thousands of tables (distinct logical table definitions, i.e. not accounting for duplication from physical sharding or replication).
Some of these store things for user-facing product entities and associations between them -- you missed the vast majority of product functionality in your "people, posts, and ads" claim. Others are for internal purposes. Some workloads use joins, others do not.
Nothing about Facebook's database design is "tall", nor is it "easy". There are a lot of huge misconceptions out there about what Facebook's database architecture actually looks like!
Advice like "never user joins" and "design around a single table" is usually just bad advice for most applications. It has nothing to do with Facebook, and ditto for Amazon based on the sibling replies from Amazon folks.
Doesn't that also say something like, it's an easier road to success if you find the tall way to market and scale, scale, scale once you find it? What is the "wide" success story to take inspiration from?
Facebook really only has people, posts, and ads.
Netflix really only has accounts and shows.
Amazon (the product) really only has sellers, buyers, and products, with maybe a couple more behind the scene for logistics.
The reason for this is because tall applications are easy. Much, much easier than wide applications, which are often called "enterprise". Enterprise software is bad because it's hard. This is where the most unexplored territory is. This is where untold riches lie. The existing players in this space are abysmally bad at it (Oracle, etc.). You will be too, if you enter it with a tall mindset.
Advice like "never user joins" and "design around a single table" makes a lot of sense for tall applications. It's awful, terrible, very bad, no-good advice for wide applications. You see this occasionally when these very tall companies attempt to do literally anything other than their core competency: they fail miserably, because they're staffed with people who hold sacrosanct this kind of advice that does not translate to the vast space of "wide" applications. Just realize that: your advice is for companies doing easy things who are already successful and have run out of low-hanging fruit. Even tall applications that aren't yet victims of their own success do not need to think about butchering their data model in service of performance. Only those who are already vastly successful and are trying to squeeze out the last juices of performance. But those are the people who least need advice. This kind of tall-centered advice, justified with "FAANG is doing it so you should too" and "but what about when you have a billion users?" is poisoning the minds of people who set off to do something more interesting than serve ads to billions of people.