It’s also about (damn) time. I know it shouldn’t, but the python community’s biz...

StewardMcOy · on Jan 28, 2022

While I happily use Python 3 now, the transition was very difficult, and I was grumpy about it at the time.

I work with code that uses a lot of files in a lot of different text encodings. Some are XML, some plain text, and some binary. Coming from other languages, Python 2's Unicode support was very difficult to work with, and my team was excited to move to Python 3, until we actually had to do the work.

Long story short, Python 3's str/bytes separation was a nightmare, especially when having to deal with third-party libraries that were expecting one or the other. This was especially true for libraries that came from Python 2, and were still expecting strs as function parameters when they should have switched to bytes. There was so much encode() and decode()ing going on that we occasionally caught ourselves getting them backwards in code reviews. We took a step back to see if we were just architecting things poorly, but no, we were doing things the best we could for our problem domain.

In the end, we traded one set of Python 2 text encoding footguns for a mostly different set of Python 3 text encoding footguns. It's not like Python 2 was great. Having a single str type instead of the str/bytes separation is worse in theory. But because Python 3 didn't design the separation of those types well enough, and still has a ton of text encoding footguns. It wasn't better enough to really justify all the work that went into the conversion.

nostoc · on Jan 28, 2022

Transitioning was hard, especially with all the libraries in various state of support between 2 and 3, I had a similar experience.

But I disagree with you about the separation of bytes and string and the current state of the language. I write a lot of python that deals with bytes and text encoding, and now that all the libraries have caught up with 3, the situation is way better than it ever was. encoding, decoding, bytes manipulations are way less prone to errors.

StewardMcOy · on Jan 28, 2022

I'm glad it's improved. I moved jobs a couple years ago, and since then, have only used Python for a few small personal projects. When I left, there were still some very rough edges around some very popular (and some not so popular) libraries.

And, as I said, I do prefer the str/bytes split over the unicode/str split, but I also wasn't doing a lot of raw byte manipulation. I agree that it was harder to do with the old str than bytes. I was mostly doing string operations on the grapheme cluster level, and then writing everything out as UTF-8, so I didn't see as much of the benefit.

zestyping · on Jan 28, 2022

> Python 3 didn't design the separation of those types well enough

I'm curious to know what you mean. How would you improve the design?

StewardMcOy · on Jan 28, 2022

That was probably an inartful way of saying it on my part. Echoing what I said in another comment, I think the API for converting between bytes and str types is bad. .encode() and .decode() are bad names for what these functions do, and let to occasional incorrect code in both Python 2 and 3. I would have preferred something akin to:

b = b"some bytes" s = u"some str" foo = b.to_string(encoding) bar = s.to_bytes(encoding)

or

foo = str(b, encoding) bar = bytes(s, encoding)

I also mentioned that I think there shouldn't be any mechanism to read or write files to/from a string without specifying an encoding. locale.getpreferredencoding() is a mistake.

But one thing I didn't mention that I think would improve the design would be to make it harder to treat str as the bag-of-bytes type it was before. Swift took some steps down this path, and honestly, it made it much less pleasant to work with strings, but a bit safer. I would hope that a better design could be found.

This may be controversial, but I don't think you should be able to just subscript a str type. instead of s[1], you should be writing s.codepoints[1] or s.graphimeclusters[1], etc. depending on what you actually want. If str is truly a string type and not a bag of bytes, it should deal with the extra complexities that being a string brings to the table.

zestyping · on Jan 29, 2022

Oh, I see what you mean. Yeah, I have gotten tripped up wondering whether I should be calling encode() or decode() many times. Using the str() or bytes() constructor would be more intuitive for sure.

Interesting thought on subscripting strings. It would be hard to make it fly after all these years of habits, but I see your reasoning.

supergarfield · on Jan 28, 2022

What do you think the current text encoding footguns are?

In a different direction, I don't know what your problem domain is/was, but in general when I'm dealing with UTF8, I don't need to convert back to bytes very often. Was the need for conversion mostly due to the libraries that still expected strings instead of bytes?

StewardMcOy · on Jan 28, 2022

It's been quite a few years since we went through the conversion, and at this point, working in Python 3 is natural to me, so I may not be able to recall all the footguns. I can say a lot of the difficulty was due to libraries, both third-party and standard, and that hasn't improved very much. I don't want to single anyone out here, because it's pervasive. In Pytuon 2, str was the bag of bytes type. I think a lot of libraries didn't want to change to accepting bytes types instead, because it broke API compatibility, but it caused a lot of issues.

I should also say that we were working with files in tons of encodings, not just UTF-8. We had UTF-16 and UTF-32, both little and big endian, with and without BOMs, but we also had S-JIS and a bunch of legacy 8-bit encodings. Often we wouldn't know what encoding a file was in, so we'd have to use the chardet library, along with some home-grown heuristics to guess.

Off the top of my head, the two biggest footguns are:

- There should be no way to read or write the contents of a file into a str without specifying an encoding. locale.getpreferredencoding() is a mistake. File operations should be on bytes only, or require an explicit encoding.

- .encode() and .decode() are very poorly named for what they do, and it wasn't that uncommon that someone would get them backwards. Sometimes, exceptions aren't even thrown for getting them wrong, you just get incorrect data.

Both of which were still issues with Python 2. There's a valid architectural argument to be had between the Python 2 way, where str was a bag of bytes, and the unicode type was for decoded bytes, and the Python 3 way, where the bytes type is your bag of bytes and str holds your decoded string. I favor Python 3's way of doing it, but it's almost six of one, half a dozen of the other. The advantages of one over the other are slight, and given how many library functions relied on the old behavior, it was probably a mistake to change it like that, rather than continuing the Python 2 way, and fixing issues like those above that caused problems.

aeturnum · on Jan 28, 2022

I haven't done a ton of work with Python recently, but the problems I remember encountering came from the fact that python doesn't try to have encodings in any other part of the basic type system. So like, if you have an int or a float, you can pass those to any interface that takes a 'number-y' value and it will mostly work like you expect. That's also how strings worked in P2 - you could pass them around and things would accept the values (though you might get gibberish out the other side). Now, in P3, things will blow up (which is helpful for finding where you went wrong ofc - I understand the utility), but it means that your code handling things-that-might-be-strings-or-bytes often needs to have a different structure than the rest of your code.

I think the P3 string/byte ecosystem was made substantially weaker by P3 deciding not to lean more into types (something I have complained about on here before!). Like...they are the only values where the stdlib is extremely specific about you passing a value that has the exact right type, but the standard tools for tracking that are pretty poor.

kortex · on Jan 29, 2022

> but it means that your code handling things-that-might-be-strings-or-bytes often needs to have a different structure than the rest of your code.

Isn't that the point? String and bytes are different beasties. You can often encode strings to bytes and just about anything accepting bytes will accept it, but the converse is not true. Bytes are more permissive in that any sequence of any 0x00-0xff is acceptable, but str implies utf8 (not always guaranteed, I've seen some shit), meaning e.g. you can dump it to json without any escaping/base encoding.

pdimitar · on Jan 28, 2022

Sounds like you should have moved on from Python to something else altogether?

Reading your comment feels like "I don't like Python for being Python", more or less. Apologies if I misread.

StewardMcOy · on Jan 29, 2022

I actually like Python a lot. Although I'm no longer using it professionally, it's my first choice for personal projects.

I'm also an advocate for using the right tool for the job, and in this case, Python may not have been the right tool for this job, but this was only one component in a much larger system. Sometimes you have to be suboptimal locally to be optimal globally.

And it's not like Python couldn't handle it, it was just that it had some design decisions that made things a bit harder than it would have been in some other languages. We got it working pretty well in Python 2, then the Python 3 transition happened, and it was a lot of work to get everything working as it had been, for only a small benefit to our team, but we got it working in 3 as well, and to my knowledge, it's still humming along.

Alex3917 · on Jan 28, 2022

> the python community’s bizarre behavior during the 2.X -> 3.X move honestly made me think less of the language.

There weren't that many people who were outright against Python 3, other than Zed Shaw for a few years. It just took until Python 3.4 for the language to really be usable in production, and after that it took a couple more years for every library to be updated. But the community has been pretty unified for 5+ years at this point.

lanstin · on Jan 28, 2022

If by unified you mean no longer arguing, sure I switched from Python 2 to Go lang. (I will concede if I am doing exploratory stuff, I will start new repos in Python 3, e.g. for boto quick exploration.) I am still bitter about the fine Python 2 code I supported and extended that is now all go or Java.

ssully · on Jan 28, 2022

This is my experience as well. There were plenty of people online vocally against Python3, but once we hit 3.4 I never worked with anyone IRL that was against it. The mindset was almost always "this is the way forward", so that's how we moved with our projects.

3pt14159 · on Jan 28, 2022

I used it production just fine before that. What didn't work for you? The unicode support was shoddy, but lots of other stuff worked fine.

The problem as I remember it was that there were some critical tools that just were not making the switch. The last one I could remember having to have a separate runtime for was graph-tool, but there were just a ton of them in the scientific computing community. Also, I don't think making print a function was worth it. So many people had muscle memory on that that personally I think it was worth just single casing it.

Alex3917 · on Jan 28, 2022

> What didn't work for you?

Weren't there a lot of major performance regressions in the first couple versions?

paganel · on Jan 28, 2022

There definitely were, at least in the early versions. It would have been fine (I guess) if those early versions/iterations had been short-lived, so to speak, and for a stable and definitely faster version (compared to 2.7) to be made available shortly after that, but afaik that wasn't the case.

If it matters I've been writing Python professionally for more than 15 years and I started my career by seeing the (less famous) Zope2 to Zope3 botched migration. When Python3 was first announced I had hoped that the devs behind the project had learned from that related experience, guess I was wrong.

loeg · on Jan 28, 2022

> But the community has been pretty unified for 5+ years at this point.

Sure, but they accomplished this mostly by displacing anyone who disagreed. That's fine, and Python 3 in a vacuum is a better language than 2. But I'd never make the mistake of choosing Python for a serious project going forward.

CJefferson · on Jan 28, 2022

The "active community" is unified, I know a bunch of people with some personal piles of Python 2 who I don't expect to upgrade until they have no other choice. I also don't see why they should -- their code is running fine.

mikewarot · on Jan 28, 2022

They forced incompatible changes on their users, all in the name of a Unicode fetish. They burned their credibility. The users acted rationally, it's the developers of Python who were abusive.

Spivak · on Jan 28, 2022

I think it’s less of a fetish and more of a “we made a huge mistake in the initial design of the language and confused text and the encoding of that text making a giant footgun that was hard to avoid.

The migration was difficult because you had to actually think about this stuff instead of it working by accident and living as a landmine if you encountered non UTF-8 text.

The fact that you actually have to decode your bytes and specify an encoding makes it really obvious when you’ve just assumed the world is UTF-8 without really backing it up.

ant6n · on Jan 28, 2022

What I don't get is it they break the language anyway, why they didn't fix classes. Most of python is pretty beautiful, but classes are so ugly, even basics like what is the Syntax for instance vs static cars, self/this, setting up constructors, calling inherited constructors. Like come on, how is it that they are uglier than Java in this regard.

franga2000 · on Jan 29, 2022

I really don't know what you're talking about. I've written a lot of object-oriented code in Python and Java and there isn't a single thing I like better in Java (that doesn't come from the typing system). I don't know about 3.0, but it's been great for years. Take a look at some Django code.

mattgreenrocks · on Jan 28, 2022

super() is a little cleaner to call now at least.

The print() change drives me nuts: many scripts use the old form, and Python forces you to change it despite recognizing the old functionality. They could've simply allowed print expr to work as is and accepted it as a wart.

The async stuff feels a bit like a bridge too far for me WRT the language. Simplicity would be standardizing on a function color (as it were).

I don't envy their position of preserving backwards compat while evolving the language.

dragonwriter · on Jan 28, 2022

> They could've simply allowed print expr to work as is and accepted it as a wart.

Old print isn't an expr at all, that was a major point of the change.

olliej · on Jan 28, 2022

I think the person you're replying to was saying that there was no reason to remove the statement form. The only thing that removing the statement form of print did was break existing code. You could easily allow print to act as a function by removing it as a reserved word in expression contexts.

They made print a function out of a misplaced sense of language "purity", where they made a random ideal and broke code to support it.

Let's compare to JS: 20 years of new features and old code still works. Keywords were added, and old code still works. Even code that uses those keywords still works. It took more effort from the language implementers, but that's as it should be: it took a bit more effort for a few teams (who make the engines), instead of a large amount of effort for tens of thousands of groups.

loeg · on Jan 28, 2022

They made substantial changes to classes in Python3.

janto · on Jan 28, 2022

If the reactions seem bizarre, maybe you're missing something.

For me it was more than "just the tech". Python 3.x was not the Python 3000 I was promised for a long time: many of the additions did not warrant breaking changes (as demonstrated by the backports) and many of Python's deeper warts still remain unaddressed, even today.

I think the deeper issue is that at the rate that 2.x got popular, the original motivation, principles and process behind the language's development got diluted. The orchestration of the push for 3.x adoption was not technically motivated in a convincing manner and signaled a change in the process. I stopped trusting the process.

closeparen · on Jan 28, 2022

"Bizarre behavior" like not immediately throwing away hundreds of thousands of hours of work product because you said so?

rcarmo · on Jan 28, 2022

I saw the drama and ignored it. Transition was seamless for 99% of the stuff I used, and the 1% I left behind I don't miss.

roywashere · on Jan 28, 2022

My company started life in about 2014 and for python is only on 3. And python 3 is great. Having written python 2 before and also perl, I really appreciate UTF-8 just works.

And I hope that python learned from 2 -> 3 and that in the future it will be better

DannyBee · on Jan 28, 2022

The fact that you think this way shows why that community is so broken about things like this. You think the user community behaved badly? It was the developers.

For the vast majority of companies, python 3.x is not better enough for the cost. That is what happened. You may think it's awesome. It may be technically better it, etc. But the cost of moving was much more for most companies than the productivity gains. We've even measured it before. That is entirely on the python folks for not providing something people want enough to move to. All CSAT surveys of python i've seen at my company, and that my counterparts had ever seen, said literally the same thing. So it seems like something the community could have discovered pretty easily in a rigorous way if they wanted. I used to spend a lot of time on python-dev. It's a group of good people and engineers for sure. There was always good rigor around technical reasons to do something, and around performance benchmarking. But i don't remember seeing a lot of productivity research or CSAT or .... It's fair to point out that most OSS communities don't do that, but that's a bad thing, not a good one, and one of the things that often leads to community trouble

One of the other major factors, besides the productivity issue, that makes it not better enough is the migration cost. Software of this kind needs to be built to be migrated from and to. If not, it's broken and you deserve all the animosity you will surely get from your users. It doesn't matter if you screwed up the language, etc. Tough crap.

In python, for lots of folk with substantial stacks, the amount that could be auto-migrated by the tooling provided was low - 40-50%. Yes, simple stuff could be auto-migrated, but not most real stuff.

We actually ended up building our own tooling to get closer to 70% because how expensive the overall thing. It processed many millions of lines of python code, automatically kept track of which things would be best to rewrite/finish migrating first (because it would unblock other things), and continuously tried to auto-migrate any non-migrated changed code in a loop to maximize automation. The total engineer cost of moving was enormous. We are talking well over a thousand SWE years.

For what, exactly? As I said, we already measured (because plenty of new projects were python-3 only) and determined that all the improvements and hullaboo did not meaningfully improve productivity for people. Not enough. It was small, at best. It also didn't reduce the rate of production bugs, etc.

So all this work, for no meaningful value.

Because not enough were willing to do it, the result was to force it. Damn the users, full steam ahead. The users are just lazy.

I talked with lots of counterparts at other companies - most felt the same way. The result for us is lots of teams gave up python forever. The only reason it's growing at all is because of ML.

IMHO, python2->3 is an object lesson in doing it wrong and then victim blaming.

ClumsyPilot · on Jan 28, 2022

"We actually ended up building our own tooling to get closer to 70%. It processed maany millions of lines of python code, automatically kept track of which things would be best to rewrite/finish migrating first (because it would unblock other things), and continuously tried to auto-migrate any non-migrated changed code in a loop to maximize automation. The engineer cost of moving was enormous."

This was definately an opportunity for commercial software, i am not sure why companies worth hundreds of billions can't sort this out and expect a small nonprofit to provide this.

DannyBee · on Jan 28, 2022

FWIW - i don't disagree, and I don't expect anything in the end, it's open source. People can do what thy want. I was more pointing out the opinion given of how the user community sucked is very broken.

What the "larger user" (however you want to frame it) community would have preferred (IMHO) is much better migration tooling, even if it came at a cost of improvements.

The dev community chose to spend time on improvements instead. That's their choice, but it doesn't make the community wrong to be upset about it.

woah · on Jan 28, 2022

> We are talking well over a thousand SWE years.

Why didn't you just take over support of python 2.7?

DannyBee · on Jan 28, 2022

We did, internally, for a bit, but since they encouraged every library to drop support, and most did, it was not a tractable long term answer.

nvusuvu · on Jan 28, 2022

Inertia seems to best describe the hesitancy to change. The language improvements stand on their own I would think.

DannyBee · on Jan 28, 2022

That's the problem - they literally do not. Technically better is not a good enough reason to change something. It has to provide some actual greater user value along axes that matter to users, and make the cost worth it. What you are seeing is that python3, well, doesn't. The options in that case are reduce cost or improve along axes that matter enough that cost is worth it.

samhw · on Jan 28, 2022

> Inertia seems to best describe the hesitancy to change.

I'd have gone for 'rigor mortis', but hey, we can't agree on everything ... like, say, a very minor change to string encoding with a 12-year managed rollout.

orangecat · on Jan 28, 2022

like, say, a very minor change to string encoding with a 12-year managed rollout

That's exactly the point: it required a massive amount of effort for relatively minor benefits.

shoo · on Jan 28, 2022

I did a bit of contract work for a place with over a million lines of custom python 2.x scripts that ran inside of a python interpreter embedded in a proprietary product which made it difficult to do things like write and execute unit tests for the code. I think they were still writing new python 2.x code in 2018.

A lot of the scripts were supporting processes that had a finite lifespan during the initial stages when the company infrastructure was being built out, rather than ongoing operational processes that needed to be performed indefinitely, so hopefully they'll be able to set a lot of that codebase on fire instead of maintaining or porting it to python 3.

fault1 · on Jan 28, 2022

> Tech is supposed to just be tech, but when the community behaves this badly about adopting improvements how can that not influence your decision to invest in that tech?

What you say was bad about it? And who were the bad people specifically? The people who were using python 2 or python 3?

For what it's worth, python3 >= 3.0 && python <= 3.2 were hideously broken in their unicode support. Arguably had worse/unusable uncode relative to python 2.6 or 2.7.

So there was a huge failure to launch type of problem, especially given how long python3 had been development.

It most definitely left a very sour taste in many people's mouth that didn't start dissipating till 3.5 or 3.6 when enough "killer" features had accumulated.

Even then, for a lot of usages, python 2.7 'just works'.