It’s Like GPT-3 but for Code – Fun, Fast, and Full of Flaws

matthewmacleod · on March 19, 2022

I was pretty sceptical of Copilot when it was announced, but after having used it for a while I think it's more that it's been kind of sold as something that it's not.

What it does offer for me, in practice, is basically "clever autocomplete". Say I wrote a little bit of code:

  bounds.max.x = bounds.min.x + width

Then copilot will suggest that the next line is:

  bounds.max.y = bounds.min.y + height

That's reasonably smart, and it adapts pretty well to whatever code you are currently writing. It makes writing those little bits of slightly repetitive code less annoying, and I like that – it feels like a useful productivity boost in the same way as regular "dumb" autocomplete is.

However I'd say too much of the initial coverage was along the lines of "you can just write the function name and it will fill in the body!". That will work if you want to write `isEven` or `capitalise` or something, which again is quite nice. But I have found that basically any other prompts product a function that either does something completely wrong, or introduces a subtle and non-obvious bug. And this actually feels kind of dangerous – I'd absolutely caught myself a few times from just going "it'll probably be fine" and accepting whatever Copilot has spewed out.

I'll remain sceptical of it as a system that sucks up and regurgitates vast lakes of open-source code and sends my every keystroke to Microsoft, among other things. But it's definitely something I'd pay for a local version of.

karmicthreat · on March 19, 2022

Yea, this is exactly where I am at with Copilot.

So github copilot is pretty neat I entered this code:

  result_dataFrame['yearOverYearMin'] = year_dataFrame.groupby('partId')['unitCost'].min()
  result_dataFrame['yearOverYearMax'] = year_dataFrame.groupby('partId')['unitCost'].max()

Then when I put in result_dataFrame['yearOverYearChange'] Copilot gave me

  result_dataFrame['yearOverYearChange'] = result_dataFrame['yearOverYearMax'] - result_dataFrame['yearOverYearMin']

So it's a fancy context aware autocomplete. And big productivity booster on code entry.

mordymoop · on March 19, 2022

It is also a lifesaver for complicated fiddly array index code. One of the most cognitively costly thing for me, decades into programming, is still anything related to array partitioning, or remapping indices. Copilot pretty quickly just gets what you’re trying to do and auto fills a correct line of code. It’s basically never wrong about this sort of thing. And even in the rare case where it’s not correct (usually because it’s mistaken about what I’m trying to do), it’s much faster for me to read the line and figure out what it does than it is to mentally slog through the problem and write the line from scratch.

josefx · on March 19, 2022

> result_dataFrame['yearOverYearMin'] = year_dataFrame.groupby('partId')['unitCost'].min()

> result_dataFrame['yearOverYearMax'] = year_dataFrame.groupby('partId')['unitCost'].max()

> And big productivity booster on code entry.

Try reducing the DRY violations and you might have to write less. Almost the entire right side of the first two lines is repeated and third line is only that long because it has to recover the results from the map instead of using local variables.

While tools like boxing gloves reduce the damage your face takes it is usually better to just try and stop hitting yourself first.

jedimastert · on March 19, 2022

Sometimes 2 lines of code is just better than 6 and not worth the extra brain cycles.

There's DRY, then there's premature optimization.

josefx · on March 19, 2022

> Sometimes 2 lines of code is just better than 6 and not worth the extra brain cycles.

I am unsure about what could possibly be better about them? Twice the chance to misspell one of the string constants? Twice the time to write the code, which in this case seems to be the big blocker? Twice the code to update in case any of it has to change? Twice the complexity per line of code?

karmicthreat · on March 21, 2022

This is pretty much the case. It was a quickly needed report of data from our ERP system and dumped into an excel sheet to use. I had not used Pandas before but seemed like a good fit for me t build this out. The people who needed this info were not very sure about what they needed, so being un-DRY is better. This is only a maybe 2 page Jupyter sheet. So it's not a huge deal.

In any case, DRY is just an optimization theme. And optimizing for understanding/intent is a higher priority for me. Sometimes they intersect, sometimes not.

AnonCoward4 · on March 19, 2022

I disagree that this is premature, but best practice. This example is actually a small lesson how this autocomplete can construct slightly less readable code. Of course not really a problem of the tool as you have to refactor anyway.

karmicthreat · on March 21, 2022

Yea, I could definitely improve this code. This was a quick excel report I build by extracting some data out of our ERP and building the report in Pandas. So those min and max values get surfaced in the output excel sheet.

It was a bit of an emergency and I was figuring Pandas out at the same time. The un-dry structure is handy for me because I can quickly tweak individual things since as a company we are still figuring out what is needed in the report.

kimukasetsu · on March 20, 2022

This. Copilot enables bad programming practices imo, especially through clunky APIs like pandas.

verhovsky · on March 20, 2022

I've had Copilot teach/tell me about things I didn't know I needed to be considering.

I was writing a parser for URL query strings (?these=things&key=value) and at one point Copilot suggested

  return [decodeURIComponent(key), decodeURIComponent(value)];

and until I saw that I didn't even know that percent encoded text in URLs is something I needed to be thinking about.

addaon · on March 19, 2022

Are you in an environment where the compiler / interpreter can do CSE across string-indexed dictionaries? If not, this code seems like it's gratuitously doing two extra dictionary lookups vs. an implementation using temporary variables. Now you may very well, and rightly, say "who cares" -- but to me, this is where I'd really see something like Copilot contribute. The example you gave saves typing, but leads to equal or lower quality "correct" code to what you would write anyway with a few more keystrokes -- but a system that could advise to consider the alternate version (with the human in the loop to confirm semantic equivalence for the particular dictionary in use, since the compiler / interpreter / advice tool probably cannot) would lead to not just more code per minute, but better code.

csee · on March 19, 2022

Is there a way to fast track being able to try out Copilot? I put my email down a few days after it was announced and haven't heard anything.

hackernewds · on March 20, 2022

Absolutely cannot wait for this to come to SQL. Or is there a way already to adopt this?

sireat · on March 19, 2022

I too was pretty skeptical of Copilot when I started using it last summer.

However it is shockingly good for filling out Python snippets - ie smarter autocomplete when teaching.

Popular libraries like Pandas, Beautiful Soup, Flask are perfect for this.

About 80% time it will fill out the code exactly they way I would want. About 10% time it will be something you want to correct or nudge.

Then about 10% of time it will be a howler or anti-pattern.

Then you simply explain it to students why it is not so great to say insert something at a beginning of a Python list.

Edit: Copilot is also great for filling out comments when teaching

# it can generate Captain Obvious what comments and also some nice why type comments as well

kromem · on March 19, 2022

That's where it's at...today.

We're only a year into this thing existing.

I'd be VERY surprised if the data from all the interactions with it don't result in a product people care less and less about Microsoft harvesting their data from because of the value produced in exchange.

By 2025, either these models will have hit a wall on diminishing returns and it will take a complete pivot to some other approach to continue to see notable gains, or the products will continue to have improved at compounding rates and access will have become business critical in any industry with a modicum of competition.

baryphonic · on March 19, 2022

> By 2025, either these models will have hit a wall on diminishing returns and it will take a complete pivot to some other approach to continue to see notable gains, or the products will continue to have improved at compounding rates and access will have become business critical in any industry with a modicum of competition.

Is there a single example in AI, or even technology as a whole, where simply continuing to apply one technique has led to compounding growth? Even Moore's Law, according to none other than Jim Keller[1], is more a consequence of thousands of individual innovations that are each quite distinct from others but that build on each other to create this compounding growth we see. There is no similar curve for AI.

In this case, GPT-3 (released in 2020) uses the same architecture as GPT-2 (released in 2019), expanded to have ~100x more parameters. It's not hard to see that compounding this process will rapidly hit diminishing returns quickly in terms of time, power consumption, cost of hardware, etc. Honestly, if Google, Amazon and Microsoft didn't see increased computational cost as a financial benefit for their cloud services, people might be willing to admit that GPT-3 is a diminishing return itself: for 100x parameters, is GPT-3 over 100x better than GPT-2?

It seems that the big quantum leaps in AI come from new architectures applied in just the right way. CNNs and now transformers (based on multi-head attention) are the ways we've found to scale this thing, but those seem to come around every 25 years or so. Even the classes of problems they solve seem to change discretely and then approach some asymptote.

Copilot will probably improve, but I doubt we will see much compounding. My best guess is that Copilot will steadily improve "arithmetically" as users react to its suggestions, or even that it will just change sporadically in response to this.

[0]https://youtu.be/Nb2tebYAaOA?t=1975

kmod · on March 19, 2022

There has been some effort to quantify "AI scaling laws": ie how much performance increases as we scale up the resources involved. See section 1.2 of https://arxiv.org/pdf/2001.08361.pdf

My main takeaway from that paper is that a 2x increase in training cost improves performance by 5% (100x by 42%). I only skimmed the paper though.

To me this says that model scaling will not get us very much farther: we can probably do one more 100x but not two.

I talked to someone working on model scaling and they see the same numbers and draw a very different conclusion: my interpretation of their argument is that they view scaling money as easy versus finding new fundamental technical advances.

ShamelessC · on March 20, 2022

> my interpretation of their argument is that they view scaling money as easy versus finding new fundamental technical advances.

This paper only discussed transformers. Given the current pace of research, it's not a given that transformers won't be replaced by something else that has better scaling laws.

And indeed, this paper tells you the amount of compute and data you need for a transformer. But Moore's law isn't doing great lately. For the purposes of research,you may be able to train trillion-parameter models - but you will likely not run such a model on your phone.

In order to merit the large cost of both training and running predictions (which don't even fit the entire model in a single GPU for larger models) - models will need to become more parameter efficient than the vanilla transformer. Otherwise it's just too expensive.

baryphonic · on March 19, 2022

I hadn't seen that paper before, but it is excellent. Thank you for sharing!

> To me this says that model scaling will not get us very much farther: we can probably do one more 100x but not two.

I totally agree with this assessment, and note the absence of a GPT-4 release.

kromem · on March 20, 2022

My whole point is that the way in which AI progress can intersect itself can lead to compounding effects in ways technology has not.

One way to improve upon GPT-3 is almost certainly going to be adding discriminators into the mix:

"GeDi: A Powerful New Method for Controlling Language Models"

https://blog.salesforceairesearch.com/gedi/

And it may be that rather than it being a CNN discriminator such as in that case, it will be a second transformer:

"[2102.07074] TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up"

https://arxiv.org/abs/2102.07074

My point is we may not need to reinvent some new model separate from transformers/GANs and the work so far as long as novel ways of intersecting trained models produces results that can in turn feed intersect with other models.

NVIDIA in particular has had some interesting focus on using AI to generate training data for AI.

"NVIDIA Omniverse Replicator Generates Synthetic Training Data for Robots | NVIDIA Technical Blog"

https://developer.nvidia.com/blog/generating-synthetic-datas...

I think it's very unlikely we need to wait 25 years for a new paradigm.

In fact, I'm willing to wager a bet that whatever new paradigm will arrive will itself be the output of AI simulating different models.

abel_ · on March 20, 2022

As someone else already mentioned, the scaling laws paint a different story empirically: we haven't hit diminishing returns at all, and there's no end in sight.

But more anecdotally, the first applied neural network paper in 1989 by LeCunn has pretty much the same format as the GPT paper: a large neural network trained on a large dataset (all relative to the era). https://karpathy.github.io/2022/03/14/lecun1989/

It really just seems that there are a certain number of flops you need before certain capabilities can emerge.

teaearlgraycold · on March 19, 2022

It's also really good at writing test cases. Once you've written a couple test cases manually it can take the "it should" string and auto-complete the body of the test with very high accuracy.

siddboots · on March 19, 2022

I largely agree with this. Occasionally it suggests a few lines that are more or less what I want, but rarely more than that.

One use case that I reach for relatively often is writing a comment to describe the one line of code I want when I can’t remember the specifics of the API I’m working with. E.g. # reset numpy random seed, and copilot pretty reliably gives me what I want.

kannanvijayan · on March 20, 2022

I agree with your last point - with an added dash of being wary of getting dependent on sophisticated tools that have one vendor.

I'm not skeptical of the idea at all, however. I see it as the rough analogy of mathematicians using proof assistants. This is a coding assistant.

It's generally true that the vast, vast majority of code one writes is boilerplate or otherwise rote. The core logic and control flow that defines the meaningful bits of any complex program are nested in the middle of tons of error handling, logging, data munging, incidental book-keeping, and other very prosaic tasks.

It will not just be handy to have a friendly intelligence assist in these tasks. It doesn't require much "intelligence" (of the sort required for figuring out what the program should do in the first place and how to architect it).

Once the technology gets firmed up a bit, the productivity multiplier will be palpable and impossible to ignore. Programmers using these tools will complete tasks faster than ones that don't, and the market will shift.

Further extensions of this technology lead to automatic generation of unit tests, automatic refactoring, etc. These all exist as very specialized tools that are explicitly coded, but it's clear to see that adding a bit of ML-driven intuition to them would greatly increase their scope and effectiveness.

udbhavs · on March 19, 2022

This could be useful for languages like Go where simple repetition and loops are preferred over clever language features like object or array destructuring that can add cognitive or performance overhead.

Zerverus · on March 19, 2022

It’s also great at translation. Actually really great down down to interpolated strings containing variables.

Take English strings.en.json, copy to strings.th.json, open side by side, delete English text and watch copilot fill in the thai translation

2muchcoffeeman · on March 20, 2022

Are the translations any good though?

ianai · on March 19, 2022

Thank you, I was wondering what the real use was like.

abecedarius · on March 19, 2022

I wonder if the usefulness varies with your approach to comments.

(Haven't used Copilot yet because it didn't have Emacs support.)

ediardo · on March 19, 2022

I've been using Copilot for 5 months while building another AI productivity tool. It's changed my habits my I'm becoming a bit dependent on it for autocompletion. It feels so good just hitting TAB and moving on.

I know some developers that aren't embracing it the same way I do, making judgements without even trying it. "This is the future", I tell them, "it makes your life much easier", but there's resistance.

Prompt engineering is quite interesting too, and it may turn into a job skill later. While using Codex, I understood the importance of knowing how to ask for the right things to a non-human. Is bit like talking to Alexa in the early days, in the sense that I couldn't talk to Alexa like a human yet, I had to be specific, clear and intentional. I still see that people who are less experienced with a smart personal assistant struggle to get their commands done.

If you love this technology and would love to try it for Explaining Code in your browser, check out the extension ExplainDev (https://explain.dev). It works on GitHub, StackOverflow and documentation websites.

Disclaimer: I built ExplainDev, with the help of Copilot.

asxd · on March 20, 2022

Looks interesting. I signed up and received an email with the chrome store link, but the email didn't include an access code (it mentions it, "...and the access key below.", but the only thing below is the store link). Is this a bug or am I missing something?

EDIT: A second email showed up about 30min later that did contain the code.

ediardo · on March 20, 2022

You just reported a new bug. It's been fixed. Thank you!

Enjoy

ExtraE · on March 19, 2022

Have you considered supporting Firefox?

Also: what’s involved in writing a browser extension? What was the experience like?

ediardo · on March 19, 2022

Yes, we will land on Firefox very soon, hopefully next month.

Copilot's been very useful. The extension is built on TS and lots of custom CSS. Codex is knowledgeable with browser extension APIs and it's helped me to write most CSS utility classes that change sizes, margins and paddings, so that I don't have to bundle the extension with another third-party library.

cercatrova · on March 19, 2022

I've written extensions before and Firefox has a very good polyfill [0] that makes it quite easy to write extensions for all browsers. It does get a bit trickier if you also want to incorporate TypeScript [1] or React however.

[0] https://github.com/mozilla/webextension-polyfill

[1] https://github.com/Lusito/webextension-polyfill-ts

fuzzythinker · on March 20, 2022

Thank you guys for creating it! I find it as useful as copilot!

superasn · on March 19, 2022

Archive link:

https://web.archive.org/web/20220315101151/https://www.wired...

P.S. Just came to know our government has banned archive.is domain. Wth goi :/

yunohn · on March 19, 2022

Are you sure it’s your government? I ask because archive.is is specifically broken on Cloudflare DNS.

[https://community.cloudflare.com/t/archive-is-not-accessible...]

superasn · on March 19, 2022

Nope it's the Indian govt. Says so in the reason why it's blocked too. Not sure what archive.is did to piss our govt to get banned like this. Could have cached a torrent or porn site maybe.

https://i.imgur.com/ckeNbU1.jpg

pabs3 · on March 20, 2022

How are they MITMing your connection? Are you not connecting via https?

amyjess · on March 20, 2022

archive.is is hosted in Russia, and it probably fell under sanctions.

randomperson_24 · on March 19, 2022

It is perfectly working for me. Probably something to do with ISP?

yunohn · on March 19, 2022

That’s quite disappointing. This is like when they banned GitHub…

dgellow · on March 19, 2022

GitHub is banned in India?

yunohn · on March 19, 2022

For a brief period in 2014-15: https://en.m.wikipedia.org/wiki/Censorship_of_GitHub

TheGoddessInari · on March 19, 2022

That's out of date, and as noted, was archive.is blocking things on their own end.

yunohn · on March 19, 2022

It’s not out of date, I’ve used Cloudflare DNS for years now and archive.is doesn’t work for me even today.

I’m not blaming CF, just noting my experience.

superasn · on March 20, 2022

I tried another site archive.today but that doesn't work here also:

https://i.imgur.com/Ympyt8R.png

1024core · on March 20, 2022

Try this: https://archive.ph/RL8Rc

la64710 · on March 19, 2022

You should let them know that you don’t agree.

bob1029 · on March 19, 2022

We tried using OpenAI/Davinci for SQL query authoring, but it quickly became obvious that we are still really far from something the business could find value in. The state of the art as described below is nowhere near where we would need it to be:

https://yale-lily.github.io/spider

https://arxiv.org/abs/2109.05093

https://github.com/ElementAI/picard

To be clear, we haven't tried this on actual source code (i.e. procedural concerns), so I feel like this is a slightly different battle.

The biggest challenge I see is that the queries we would need the most assistance with are the same ones that are the rarest to come by in terms of training data. They are also incredibly specific in the edge cases, many time requiring subjective evaluation criteria to produce an acceptable outcome (i.e. recursive query vs 5k lines of unrolled garbage).

seibelj · on March 19, 2022

> … but it quickly became obvious that we are still really far from something the business could find value in.

My experience with basically everything that is marketed as AI.

m00dy · on March 19, 2022

I have been using copilot for quite some time:

here are my notes

1-)I suggest every developer to try it at least

2-)It will increase productivity for sure

3-)The bugs caused by copilot will trigger a new nerve in your brain. So, it is dangerous but danger is good.

abel_ · on March 20, 2022

While others here have touched on the idea that Codex has changed their coding habits, what I find interesting is that Codex has changed how I write code altogether. For example, I had to connect a database to an API a little while ago. Obviously I had the option to use an ORM as one would normally. But instead, I just wrote out all the SQL commands and wrapper functions in one big file. Since it was all tedious and predictable, Codex helped me to write it in just a few minutes, and I didn't need to muck around with some complex ORM. These are the tradeoffs I'm personally excited about.

_dh54 · on March 19, 2022

Until the Copilot product is accessible to the public on the same terms as the free software on which it is based, it is another example of corporate exploitation of the commons, or, in other words, open theft.

If it separately becomes a long term trend for private companies to use neural net regurgitation to allow them to use free software without complying with the GPL, free software must be completely abandoned.

andybak · on March 19, 2022

I disagree. This feels like a similar overextension of the concept of intellectual property that brought us the Oculus vs Google case.

On the whole, the things that Copilot "steals" are the things that probably shouldn't be legally protected in the first place.

_dh54 · on March 19, 2022

The logical conclusion of your position is a de facto situation in which only private corporations get to enjoy intellectual property rights. It’s easy to steal from free software developers but if you steal anything from a corporation your life will be ruined.

andybak · on March 19, 2022

I can't reply in full right now but that's a false dichotomy.

desireco42 · on March 19, 2022

Copilot is wonderful, when you use it appropriately. It autocompletes and has a lot of good knowledge of the code.

I had interview that usually lasts an hour and people don't get to do all the code, I did it in 15 minutes or so mostly because I knew what needs to happen and code was autocompleted quickly as I would start writing.

It really helps you focus on what you want to do and don't have to think about syntax or correct variable name.

You still need to be a good developer, but it helps you greatly to do your work.

lolinder · on March 20, 2022

> I had interview...

I'm curious to know more about this story. A job interview? Did the interviewer know you used copilot and not care?

desireco42 · on March 21, 2022

Yeah, so they came up with 5 point problem that get progressively more difficult. I heard from recruiter that previous candidates didn't do more than 2 or 3 of those.

He did notice that my autocomplete is somewhat magical. I fully explained him what this is, I believe he didn't completely understand and kept talking about some other plugin which I think is some kind of intelisense.

Like I said, even with me being chatty (I go on tangent often), I was done with all 5 in record time, managed to do refactoring of last 2, all thanks to Copilot. I don't think in any point Copilot did something I didn't intend, but it did complement me fantastically.

Hope you enjoy this story.

pabs3 · on March 20, 2022

I wonder what the copyright status of things written by Copilot is? Since a human didn't write some of the code produced, does that mean that those portions aren't copyrightable?

Related article "If Software is My Copilot, Who Programmed My Software?"

https://sfconservancy.org/blog/2022/feb/03/github-copilot-co...

YeGoblynQueenne · on March 20, 2022

I don't wanna jinx it because it's certainly a useful tool but I wonder what will happen when Copilot is widely available and CS students start handing in their programming assignments done by Copilot (and only Copilot).

josefx · on March 20, 2022

Easy to check, just put in the first few words and if copilot autocomplete is identical to the remaining solution you just caught a cheater.

Or use names in the assignment that are banned to prevent copilot from working. For example Q_rsqrt is banned because copilot had a tendency to just copy paste the original quake source including comments verbatim.

YeGoblynQueenne · on March 20, 2022

>> Easy to check, just put in the first few words and if copilot autocomplete is identical to the remaining solution you just caught a cheater.

That doesn't work. Copilot is not deterministic. It generates different completions for the same prompt, at random. I don't think you can catch them all, either.

I don't think it's so simple as banning identifiers either. Copilot is capable of adjusting its code to identifiers in the prompt and reusing them correctly. It's quite impressive at that, really.

So no, I don't think it's that easy peasy.

hawthornio · on March 19, 2022

Does anyone have experience using Copilot with functional languages?

fuzzythinker · on March 20, 2022

Just started using w/ clojure & scheme. No where near as useful as popular languages. My thoughts may change as I use it more, but I'll say it's barely better than w/o right now.