More

whopa · on Oct 6, 2018

I think this is why AWS promotes DynamoDB so much, because it's one of the few datastores that work sanely with non-VPC lambdas.

whopa · on Sept 2, 2018

I wonder why all the predominant RDBMSes are heavyweight connection based, and why this is presumably so hard to change. There's nothing in SQL that requires this to be the case, at least conceptually.

wgjordan · on Sept 2, 2018

TCP/IP performs a handshake to establish connections so there will always be some per-connection overhead required for this, no matter the service on the other end.

Beyond that, MySQL for example pre-allocates a fixed amount of memory for read, join and sort buffers per connection to efficiently handle different types of queries, though the buffer sizes are all tunable to optimize performance. There is also some OS-scheduler overhead involved in creating and running a separate thread per connection. A thread-cache optimizes the case where lots of short-lived connections are being created. (I dont have direct experience with PostgreSQL but I read that it forks a separate process per connection, which would presumably incur a significantly higher overhead than threads.)

Beyond this, any more specific discussion requires quantifying connection 'weight' to dispel/confirm your superstition that 'all the predominant RDBMSes are heavyweight'. In an apples-to-apples benchmark, it's quite possible that the connection-weight of some RDBMSes might actually be lighter than other database engines. I dont know myself, but would be interested if anyone has any such data to share on this.

_skel · on Sept 3, 2018

It seems like pretty much everything was connection-based a few decades ago, when those systems were written. This can lead to some behavior in legacy software that causes big problems in cloud environments, or really any environments where connections are transitory.

The problem really boils down to the tight coupling between the HTTP layer and the rest of the stuff all the way down the stack. If you assume connections are always long-lived and stateful, you can make performance and memory optimizations by reserving memory buffers and threads for the sole use of a single connection, and using threadlocals for storing session data.

But if each connection gets its own thread (or thread pool), idle connections lock up resources and can prevent new connections from being opened, because all of the threads are currently allocated to existing connections. And setting up every new connection is a bit expensive and wasteful if it's only going to be used for one request.

I think the industry trend toward transitory connections, and even transitory application instances a la Kubernetes pods, is a good thing for a lot of reasons. If nothing else, the knowledge that connections can be short-lived leads to more fault-tolerant software because the business logic layer cannot depend on assumptions about the HTTP layer. The big downside is that a lot of older stuff isn't really suitable for that kind of world, serverless or not, and it can be really hard to refactor.

pvg · on Sept 2, 2018

Part of it is legacy but also, interactions with an RDBMS are often stateful so it's useful to have a session (in which such things as transactions can live, etc). I'd guess there isn't much reason to change this because there are standard and effective workarounds that just haven't yet made their way to things like lambda.

lostapathy · on Sept 2, 2018

Can you elaborate on what “standard and effective workarounds” you are referring to here?

pvg · on Sept 3, 2018

Connection pooling is the most obvious one which is so common it tends to be transparent and built into many DB access libraries. Then there are all sorts of proxies/load balancers/multiplexors. If you think about it, the bits of code that handle web requests in your typical web app/framework/whatnot are very much like lambdas and when you write those, you generally don't have to worry about the cost of DB connections because it's a well-solved problem.

k__ · on Sept 2, 2018

Saves overhead.

Crappy application-servers pump queries like nobodies business, so you need to shave every byte of them you can.

whopa · on Aug 26, 2018

I had to do similar a month ago, around $20k as well. It was from my account at BofA to an account at a different bank. I did it all online, and it cost $10 and cleared the next day, so within 24 hours. I did this all online without talking to any person, though I did have to 2FA into my account.

I get that some banks might not be as competent yet, but signs look like they'll all get there in the end.

whopa · on Feb 24, 2018

As a sibling comment has mentioned, there are high-skills immigration paths in the US, they just aren't very well known, and have a very high bar to clear. It's not great, but it does exist. I personally know 2 Indians who got their green card in a matter of months.

That said, the Canadian system, while definitely better than the US system, still isn't that great, and not something to model after. Specifically, it gives too much weight to credentialing, which is unfair in its own right, and still very much subject to gaming.

I personally believe in complete free movement of labor (which totally existed before, the current state of things is relatively young), but that seems politically infeasible around the world right now. Brexit is an example of even taking a step back from it. Maybe this will someday happen in my lifetime.

whopa · on Feb 4, 2018

People keep saying that Node is async native, but the GC is global and synchronous, and that property will bite you on availability even at moderate traffic, i.e. sooner than you'd expect.

whopa · on Feb 4, 2018

> I feel that if you make the best choice you can given the information you have, you've nothing to regret.

You did miss something. Trying to be a gymnast wasn't her choice, her mom made that choice for her when she was 3 years old. This is the story of a failure to be elite in the sports world, and coming to terms with a dream that was externally foisted onto her at a formative age that didn't get realized.

I don't think this situation applies to the scenarios most HN readers deal with. Your advice is a lot better for this crowd.

ams6110 · on Feb 4, 2018

This is a big problem in youth sports. Forcing kids to specialize at a very early age. A far better approach is to expose your kids to a variety of sports and let them figure out which ones they like. I do think kids should do sports, the activity is good both physically and mentally, they learn that effort produces results and they learn how to win and lose and move on. And it opens them up to social situations that they will otherwise be completely cut out of.

But unless your kid is one of the rare ones who is really athletically gifted, setting up dreams of scholarships and professional careers is just setting them up for failure. You should not even be thinking about that until about middle school age, and only on advice of people who will give you an unbiased evaluation. Johnny may be the best football player on the team and still be absolutely unremarkable for scholarship or professional consideration. Most parents, unless they were elite themselves, don't know enough to judge and certainly are not objective.

mcguire · on Feb 4, 2018

This would probably be best for the kids, but...

"...not even be thinking about that until about middle school age..."

your kids will never be professional football, baseball, basketball (or soccer, probably) players, or Olympic gymnasts, or classical musicians because they will be competing against people who have been practising since they were three. No matter how much natural gift they have.

ams6110 · on Feb 4, 2018

Disagree. Professional athletes cannot be made by practice alone. The natural gift, coordination, balance, athletic ability, competitive drive, and other inborn traits are what matters most. There are professional athletes who didn't even seriously play their pro sport until high school, and uncounted thousands of kids who were pushed into something in preschool and never earned a dime from it.

lsc · on Feb 5, 2018

So, certainly the 'gift' is necessary to compete at a high level... but so is practice. You need both. In fact, my understanding is that one of the major 'gifts' that most elite athletes have is that they recover faster than the rest of us, so they can productively practice things that require muscle growth more than you or I could. That "gift" becomes largely worthless if they don't practice.

In a field you might understand better, I have a reasonably high IQ, which helps a lot when it comes to tests like the GRE and the MAT. I scored in the 95th percentile on the GRE verbal reasoning test, and the 45th percentile on the GRE math test (at age 37, with no college experience) because I have not practiced math. I mean, I'm practicing now,[1] and getting better, but I'm never going to be as good at math as I would have been if I had taken it seriously from a young age. This is especially stark for me, because I work in the computer industry and am surrounded by people who studied math from an early age, for whom it's simple and natural. Nearly everyone I work with did calculus in high school, and found it easy.

For that matter, I think I could bring up my verbal reasoning with practice, as well; I read a lot, and my intuition for what feels right in a sentence is usually right, but I fail grammar tests that require me to name the error.

[1]I'm on Khan academy now, and I'm super amused at the badges I get. I'm doing it in a linear way, rather than skipping ahead, so I'm 60% through "the world of math" challenge. The badges I got this week were all from programs created by what I think are prestigious schools... at the 8th grade level.

skybrian · on Feb 4, 2018

You know what? That's good to know. The lesson is that any competition that requires people to train since they were 3 has too many competitors and should be spurned. This is a market signal to find something else to do.

One of the reasons we have lots of ways of competing is so that there are more (incomparable) ways to be good at something.

Specifically, this article tells us that you don't want your daughter to try to become an Olympic gymnast.

sincerely · on Feb 5, 2018

To the best of my knowledge, research is indicating that early specialization in sports only contributes to a very modest gain in skill later in life at the cost of much higher dropout rates, decreased measured enjoyment, and increased injury rates among participants.

My personal experience as a near-Olympic class athlete (swimming) backs this up - the people that were on club teams at age 9 were not the same people that were later swimming at a level where they could qualify for Olympic Trials.

nradov · on Feb 5, 2018

False. There are many professional athletes playing today who started much later than age 3.

mcguire · on Feb 5, 2018

In gymnastics?

Certainly, they don't let you play football much until you're 6-9 and then not seriously until you're 12-14. Likewise, basketball and baseball. But very few professionals or top amateurs (for those sports where that is the top) started much later than the earliest they could.

walshemj · on Feb 4, 2018

Well in some cultures kids ( male kids especially) are pushed to become lawyers, doctors or programmers even if the have no aptitude or interest which is a similar thing.

whopa · on Jan 14, 2018

This is an informative and well written article, but seems incomplete in this day and age. In public cloud environments, network attached storage is far more prevalent, so the swap story may be different there (I honestly don't know though). Since the author works at Facebook, he probably lacks experience in this regard.

Anderkent · on Jan 14, 2018

Every cloud provider I've worked with (okay, so AWS :P) gives you ephemeral local storage. Obvioulsy you don't swap onto a network drive.

akerl_ · on Jan 14, 2018

Even on AWS they're phasing out local storage on new instance types: https://ec2instances.info/ (search for ec2 only, but it's the majority of new instance families)

whopa · on Jan 14, 2018

Modern AWS instance types are EBS-only.

eikenberry · on Jan 14, 2018

With the exception of the High IO types (I2/I3). They still get it and the newer instances get NVMe SSDs. In other words they are making it a feature of certain types that would benefit from it.

_msw_ · on Jan 14, 2018

For example, F1 instances have NVMe local instance storage.

https://aws.amazon.com/ec2/instance-types/f1/

Anderkent · on Jan 14, 2018

Huh. You're right; it seems for the newer instance types only c1.medium and m1.small get swap mounts. That seems like a mistake by aws; but I guess you can a M3 instead of a M5.

merb · on Jan 14, 2018

well the default kubernetes install (kubeadm) will actually fail installing when having swap enabled. (even worse you can force him to ignore that, but kubelet would fail starting when swap is enabled).

whopa · on Nov 1, 2017

If you're in AWS land you can use Kinesis firehose to S3, which is perfect for Spark. Way more straightforward than any Kafka solution.

whopa · on Nov 1, 2017

At small scale just go with Kinesis. The base semantics are pretty much the same between the two, and Kafka is terribly complex to run. The hosted Kafka solutions are too expensive for small scale.

Kinesis has a real auth story too, plus you can trigger Lambda functions off streams.

crcastle · on Nov 1, 2017

Disclosure: I work for Heroku. Heroku launched a cheaper managed Kafka 1.5 months ago. It starts at $100/month (pro-rated to second). That's ~$3.33/day. Great if you want to play, learn, or test out a proof-of-concept.

It's multi-tenant, but interaction is nearly identical to interacting with a dedicated kafka cluster -- i.e. you can use any regular kafka client library.

Check out docs[1] and launch blog post[2]. Happy to answer any questions here or through email (contact info in profile).

[1]: https://devcenter.heroku.com/articles/multi-tenant-kafka-on-...

[2]: https://blog.heroku.com/kafka-on-heroku-new-plans

jurre · on Nov 1, 2017

> Kafka is terribly complex to run

I read this quite often, but we run a relatively small kafka cluster on GCP and it's pretty much hassle-free. We also run some ad-hoc clusters in kubernetes from time to time which also works well.

What exactly have you found complex about running Kafka?

nemothekid · on Nov 1, 2017

>What exactly have you found complex about running Kafka?

I run small 2-node kafka cluster that processes to 10 million messages/hr - not large at all - it's very stable, for almost a year now. However what was complex was:

* Setup. We wanted it managed it by mesos/marathon, and having to figure out BROKER_IDs took a couple hours of trial and error.

* Operations. Adding queues and checking on consumers isn't amazing to do from the command line.

* Monitoring. It took a while before I settled on a decent monitoring solution that could give insight into kafka's own consumer paradigm. Even still there are more metrics I would like to have about our cluster that I don't care to put the time in to retrieve.

crcastle · on Nov 1, 2017

Another thing I found "complex" was the Java/Scala knowledge requirement. I wanted Kafka-like functionality for a Node.js project, but my limited Java and Scala knowledge made me concerned about my ability to deal with any problems I might run into.

In other words, I could probably get everything up and running (especially with the various Kafka-in-Docker projects I found), but what happens if (when) something goes wrong?

eklavya · on Nov 2, 2017

What do you mean by "Java/Scala knowledge requirement"? I don't know much c/c++ but I use postgres just fine. There is a bunch of stuff in software ecosystem in a bunch of languages that if I had to know it all I wouldn't progress much.

jurre · on Nov 3, 2017

I've never had to dive into any Java or Scala to maintain our Kafka cluster

pjmorris · on Nov 1, 2017

> Monitoring. It took a while before I settled on a decent monitoring solution that could give insight into kafka's own consumer paradigm.

Would you be willing to write a bit (or point to a post with) more about this? What do you find useful?

nemothekid · on Nov 1, 2017

Like I mentioned our Kafka setup is relatively small - we moved from RabbitMQ to Kafka because of the sheer size (as in byte size) of the messages we needed to process (~10 million/hr), where each message could be 512-1024kb which caused RabbitMQ to blowup unpredictably.

Secondly, due to the difference in speed in the consumer and producer, we typically have an offset lag of around 10MM, and its important to monitor this lag for us because if it gets too high, then it means we are falling behind (our consumers scale up and down through the day to mitigate this).

Next, we use Go, which is not an official language supported by the project but has a library written by Shopify called Sarama. Sarama's consumer support had been in beta mode in a while, and in the past had caused some issues were every partition of a topic wasn't being consumed.

Lastly, at the time we thought creating new topics would be a semi-regular event, and that we might have dozens of them (this didn't pan out), but having a simple overview of the health of all of our topics and consumers was thought to be good too.

We found Yahoo's Kafka Manager[1], which has ended up being really useful for us in standing up and managing the cluster without resorting to the command line. It's been great, but it wasn't exactly super obvious for me to find at the time.

Currently the only metrics I don't have are things plottable things like processed/incoming msg/sec (per topic), lag over time and disk usage. I'm sure these are now easily ingested into grafana, I just haven't had the time to do it.

All of this information is great to have, but requires some setup, tuning, and elbow grease that is probably batteries included in a managed service. At the same time however, this is something you get almost out of the box with RabbitMQ's management plugin.

[1] https://github.com/yahoo/kafka-manager

jurre · on Nov 1, 2017

Yes, I do agree with these (except mesos is not a requirement for us). Is any of this significantly better for hosted Kafka or kinesis though? I have no experience with either

manigandham · on Nov 2, 2017

Yes, not having to worry about any of that is primary reason for managed services.

qaq · on Nov 1, 2017

What is the point of 2 node cluster?

nemothekid · on Nov 1, 2017

Topic sharding. The messages were pretty large and at the time we set this up we were on DO-like platform where the only way to get more disk space was to buy a larger instance. We didn't need the extra cpu power, but needed extra disk space, and it was cheaper to opt to two nodes instead of upgrading to n+2.

CSDude · on Nov 1, 2017

Running Kafka is just fine, the issues arise when a node fails, when you need to add data and re-partition a topic. However, it is not that hard once you know what to do, but Kinesis is simpler but it is expensive as shit.

optimuspaul · on Nov 1, 2017

At small scale Kinesis is far less expensive. There is definitely a point where Kinesis becomes more expensive, especially if you consider the operational and human costs involved.

knicholes · on Nov 1, 2017

I get shit for free daily!

joaodlf · on Nov 1, 2017

I don't understand how Kafka is a complex project to run. It's dead simple to install alongside kafka manager and we have dedicated no time to it since installation - just runs and does it's job.

lima · on Nov 1, 2017

> Kafka is terribly complex to run

That does not match my experience at all. Of all the distributed message queues I've tried, Kafka has been - by far - the easiest to operate.

It works well out-of-the box and even setting it up with ZooKeeper is relatively simple.

tomsthumb · on Nov 1, 2017

Kafka can do client side certificates, no? That would be a real auth story.

lima · on Nov 1, 2017

It can, and yes, it works well.

whopa · on July 23, 2017

> they aren't biased by the immaculate freeways and roads in California -- Pittsburgh has a much wider variety of challenging driving situations, weather, and conditions than California so it is a great test bed for developing autonomous vehicles

Careful, don't compare Pittsburgh to an entire state. Pennsylvania doesn't have any real mountains, whereas California does. Google tests their self driving vehicles in the Lake Tahoe area, which in the winter can be much more challenging than anywhere in within 500 miles of Pittsburgh.

Navigating serious grade changes, both uphill and downhill, presents more of a challenge for trucks too, even for humans right now. The only places to really test that in the US are pretty much west of Denver.

sliken · on July 23, 2017

This sounds like someone who has never driven in Tahoe and Pittsburgh. I've done quite a bit in both.

I can tell you that 80 and 50 that head over the sierras are often closed for what would be consider relatively minor snow in Pittsburgh. Sometimes it's not even the road conditions, just lack of visibility (fog or snow). Significant ice is also fairly rare just because of level of maintenance, I suspect somewhat fueled by poor California drivers and ski resorts that push for excessive road maintenance.

On the other hand Pittsburgh gets plenty of snow, plenty of storms, and I can assure you they don't close the highways unless it's a storm of the century so bad that you'll not even be able to find cars let alone drive them. Additionally the Pittsburgh area roads have significant elevation changes, often narrow, and poorly maintained. Take for instance the top 10 steeps roads in the USA. Pittsburgh has #2 and it snows there. SF has #9 and #10, but it rarely snows there. Another puzzling factor is Pittsburgh uses a ton of salt, yet has temperature variations that often lead to snow melting from salt, then refreezing in sheets of ice or "black" ice. I've definitely skidded WAY further in the Pittsburgh on ice than I have have around Tahoe. For a year or so I was crossing 3 7200 foot passes each weekend around Tahoe and in the last 20 years I'm often up around there for various reasons.

Even the average Pittsburgh driver seems to deal with snow MUCH better than the ones I find up around Tahoe area when it snows. Even though the Pittsburgh driver is likely in a 10 year old front wheel drive econobox instead of a newish AWD SUV.

Hydraulix989 · on July 23, 2017

As an SF transplant, my only thoughts after seeing California drivers struggling with the artificial snow and hills at Tahoe were "these people have never been to Pittsburgh."

Also, eastern Pennsylvania has the Appalachian mountains, and last time I checked, they were "real."

nostrademons · on July 23, 2017

Also a Bay Area transplant here, initially from Boston.

I'd agree that Californian snow-driving skills are a joke, and that Tahoe isn't actually that difficult compared to New England winters.

However, there really are terrain types that CA has (and Google tests its self-driving cars on) that just don't exist back east. There's nothing like CA 1 on the east coast, with windy 15mph switchbacks and a sheer several hundred foot drop into the ocean if you miss a curve. Nor do they regularly have to deal with the road being closed because of rockslides, or Tesla drivers who pass you illegally.

whopa · on July 23, 2017

https://en.wikipedia.org/wiki/Mount_Davis_(Pennsylvania) - 3,213 ft - highest point in PA

https://en.wikipedia.org/wiki/Tejon_Pass - 4,160 ft

Tejon Pass is on I-5 in northern LA County, and it is a huge trucking route. The grade is very steep between the Central Valley and the top of the pass, and is fairly challenging for trucks. There is no equivalent to those conditions in PA.

Also, if your impression of Tahoe is only the heavily touristed parts, your view is incomplete. The mountain roads in the Sierras have no equivalent east of the Mississippi.

ntsplnkv2 · on July 23, 2017

looks like someone has never been to Pennsylvania.

Tejon Pass' major difficulty is the grade, and that's about it. PA may not have a highway that matches that grade, but many come close, and there are far more tight curves, typically far worse road conditions, and bad weather season is far more common than in N LA.

whopa · on July 23, 2017

My point about Tejon Pass is that it's a steep grade mountain pass with heavy traffic including lots of trucks. There's no equivalent in PA.

Highway trucking will be the first significant deployment of autonomous vehicles. One of the big challenges is Mountain West interstates.

I do agree that the NE US is a proper testbed for bad weather city driving, since no West Coast cities have really that bad winter weather, I'm just objecting to the claim that somehow Pittsburgh captures all the challenging road conditions that autonomous vehicles will encounter.

thesmallestcat · on July 23, 2017

Have you ever driven on 70 or 80 through PA? Because "steep grade mountain pass with heavy traffic, including lots of trucks" is an apt description of either route.

nostrademons · on July 23, 2017

I'm also thinking of the Virgin River Gorge on I-15 in Southern Utah, which isn't just steep and winding, but also extremely narrow and windy, and yet supports interstate speeds. I don't know of anything similar in the Eastern U.S; upstate NY and Appalachia have plenty of gorges, but most are local roads and aren't the main interstate thoroughfare between them.

sliken · on July 23, 2017

Definitely easy for a non-Pittsburgh to equate difficulty with elevation. Which ignores elevation changes, weather, sharp turns, tunnels, and bridges. Also a strange fondness for salt which generally makes things worse. Find a Pittsburgher who thinks driving in Tahoe is hard, even off the beaten path.

Personally I try my best to get as far off the beaten path as I can. Tahoe and surrounding areas is a cake walk compared to Pittsburgh.