I was pretty sceptical of Copilot when it was announced, but after having used it for a while I think it's more that it's been kind of sold as something that it's not.
What it does offer for me, in practice, is basically "clever autocomplete". Say I wrote a little bit of code:
bounds.max.x = bounds.min.x + width
Then copilot will suggest that the next line is:
bounds.max.y = bounds.min.y + height
That's reasonably smart, and it adapts pretty well to whatever code you are currently writing. It makes writing those little bits of slightly repetitive code less annoying, and I like that – it feels like a useful productivity boost in the same way as regular "dumb" autocomplete is.
However I'd say too much of the initial coverage was along the lines of "you can just write the function name and it will fill in the body!". That will work if you want to write `isEven` or `capitalise` or something, which again is quite nice. But I have found that basically any other prompts product a function that either does something completely wrong, or introduces a subtle and non-obvious bug. And this actually feels kind of dangerous – I'd absolutely caught myself a few times from just going "it'll probably be fine" and accepting whatever Copilot has spewed out.
I'll remain sceptical of it as a system that sucks up and regurgitates vast lakes of open-source code and sends my every keystroke to Microsoft, among other things. But it's definitely something I'd pay for a local version of.
It is also a lifesaver for complicated fiddly array index code. One of the most cognitively costly thing for me, decades into programming, is still anything related to array partitioning, or remapping indices. Copilot pretty quickly just gets what you’re trying to do and auto fills a correct line of code. It’s basically never wrong about this sort of thing. And even in the rare case where it’s not correct (usually because it’s mistaken about what I’m trying to do), it’s much faster for me to read the line and figure out what it does than it is to mentally slog through the problem and write the line from scratch.
Try reducing the DRY violations and you might have to write less. Almost the entire right side of the first two lines is repeated and third line is only that long because it has to recover the results from the map instead of using local variables.
While tools like boxing gloves reduce the damage your face takes it is usually better to just try and stop hitting yourself first.
> Sometimes 2 lines of code is just better than 6 and not worth the extra brain cycles.
I am unsure about what could possibly be better about them? Twice the chance to misspell one of the string constants? Twice the time to write the code, which in this case seems to be the big blocker? Twice the code to update in case any of it has to change? Twice the complexity per line of code?
This is pretty much the case. It was a quickly needed report of data from our ERP system and dumped into an excel sheet to use. I had not used Pandas before but seemed like a good fit for me t build this out. The people who needed this info were not very sure about what they needed, so being un-DRY is better. This is only a maybe 2 page Jupyter sheet. So it's not a huge deal.
In any case, DRY is just an optimization theme. And optimizing for understanding/intent is a higher priority for me. Sometimes they intersect, sometimes not.
I disagree that this is premature, but best practice. This example is actually a small lesson how this autocomplete can construct slightly less readable code. Of course not really a problem of the tool as you have to refactor anyway.
Yea, I could definitely improve this code. This was a quick excel report I build by extracting some data out of our ERP and building the report in Pandas. So those min and max values get surfaced in the output excel sheet.
It was a bit of an emergency and I was figuring Pandas out at the same time. The un-dry structure is handy for me because I can quickly tweak individual things since as a company we are still figuring out what is needed in the report.
Are you in an environment where the compiler / interpreter can do CSE across string-indexed dictionaries? If not, this code seems like it's gratuitously doing two extra dictionary lookups vs. an implementation using temporary variables. Now you may very well, and rightly, say "who cares" -- but to me, this is where I'd really see something like Copilot contribute. The example you gave saves typing, but leads to equal or lower quality "correct" code to what you would write anyway with a few more keystrokes -- but a system that could advise to consider the alternate version (with the human in the loop to confirm semantic equivalence for the particular dictionary in use, since the compiler / interpreter / advice tool probably cannot) would lead to not just more code per minute, but better code.
I'd be VERY surprised if the data from all the interactions with it don't result in a product people care less and less about Microsoft harvesting their data from because of the value produced in exchange.
By 2025, either these models will have hit a wall on diminishing returns and it will take a complete pivot to some other approach to continue to see notable gains, or the products will continue to have improved at compounding rates and access will have become business critical in any industry with a modicum of competition.
> By 2025, either these models will have hit a wall on diminishing returns and it will take a complete pivot to some other approach to continue to see notable gains, or the products will continue to have improved at compounding rates and access will have become business critical in any industry with a modicum of competition.
Is there a single example in AI, or even technology as a whole, where simply continuing to apply one technique has led to compounding growth? Even Moore's Law, according to none other than Jim Keller[1], is more a consequence of thousands of individual innovations that are each quite distinct from others but that build on each other to create this compounding growth we see. There is no similar curve for AI.
In this case, GPT-3 (released in 2020) uses the same architecture as GPT-2 (released in 2019), expanded to have ~100x more parameters. It's not hard to see that compounding this process will rapidly hit diminishing returns quickly in terms of time, power consumption, cost of hardware, etc. Honestly, if Google, Amazon and Microsoft didn't see increased computational cost as a financial benefit for their cloud services, people might be willing to admit that GPT-3 is a diminishing return itself: for 100x parameters, is GPT-3 over 100x better than GPT-2?
It seems that the big quantum leaps in AI come from new architectures applied in just the right way. CNNs and now transformers (based on multi-head attention) are the ways we've found to scale this thing, but those seem to come around every 25 years or so. Even the classes of problems they solve seem to change discretely and then approach some asymptote.
Copilot will probably improve, but I doubt we will see much compounding. My best guess is that Copilot will steadily improve "arithmetically" as users react to its suggestions, or even that it will just change sporadically in response to this.
There has been some effort to quantify "AI scaling laws": ie how much performance increases as we scale up the resources involved. See section 1.2 of https://arxiv.org/pdf/2001.08361.pdf
My main takeaway from that paper is that a 2x increase in training cost improves performance by 5% (100x by 42%). I only skimmed the paper though.
To me this says that model scaling will not get us very much farther: we can probably do one more 100x but not two.
I talked to someone working on model scaling and they see the same numbers and draw a very different conclusion: my interpretation of their argument is that they view scaling money as easy versus finding new fundamental technical advances.
> my interpretation of their argument is that they view scaling money as easy versus finding new fundamental technical advances.
This paper only discussed transformers. Given the current pace of research, it's not a given that transformers won't be replaced by something else that has better scaling laws.
And indeed, this paper tells you the amount of compute and data you need for a transformer. But Moore's law isn't doing great lately. For the purposes of research,you may be able to train trillion-parameter models - but you will likely not run such a model on your phone.
In order to merit the large cost of both training and running predictions (which don't even fit the entire model in a single GPU for larger models) - models will need to become more parameter efficient than the vanilla transformer. Otherwise it's just too expensive.
My point is we may not need to reinvent some new model separate from transformers/GANs and the work so far as long as novel ways of intersecting trained models produces results that can in turn feed intersect with other models.
NVIDIA in particular has had some interesting focus on using AI to generate training data for AI.
"NVIDIA Omniverse Replicator Generates Synthetic Training Data for Robots | NVIDIA Technical Blog"
As someone else already mentioned, the scaling laws paint a different story empirically: we haven't hit diminishing returns at all, and there's no end in sight.
But more anecdotally, the first applied neural network paper in 1989 by LeCunn has pretty much the same format as the GPT paper: a large neural network trained on a large dataset (all relative to the era). https://karpathy.github.io/2022/03/14/lecun1989/
It really just seems that there are a certain number of flops you need before certain capabilities can emerge.
It's also really good at writing test cases. Once you've written a couple test cases manually it can take the "it should" string and auto-complete the body of the test with very high accuracy.
I largely agree with this. Occasionally it suggests a few lines that are more or less what I want, but rarely more than that.
One use case that I reach for relatively often is writing a comment to describe the one line of code I want when I can’t remember the specifics of the API I’m working with. E.g. # reset numpy random seed, and copilot pretty reliably gives me what I want.
I agree with your last point - with an added dash of being wary of getting dependent on sophisticated tools that have one vendor.
I'm not skeptical of the idea at all, however. I see it as the rough analogy of mathematicians using proof assistants. This is a coding assistant.
It's generally true that the vast, vast majority of code one writes is boilerplate or otherwise rote. The core logic and control flow that defines the meaningful bits of any complex program are nested in the middle of tons of error handling, logging, data munging, incidental book-keeping, and other very prosaic tasks.
It will not just be handy to have a friendly intelligence assist in these tasks. It doesn't require much "intelligence" (of the sort required for figuring out what the program should do in the first place and how to architect it).
Once the technology gets firmed up a bit, the productivity multiplier will be palpable and impossible to ignore. Programmers using these tools will complete tasks faster than ones that don't, and the market will shift.
Further extensions of this technology lead to automatic generation of unit tests, automatic refactoring, etc. These all exist as very specialized tools that are explicitly coded, but it's clear to see that adding a bit of ML-driven intuition to them would greatly increase their scope and effectiveness.
This could be useful for languages like Go where simple repetition and loops are preferred over clever language features like object or array destructuring that can add cognitive or performance overhead.
I've been using Copilot for 5 months while building another AI productivity tool. It's changed my habits my I'm becoming a bit dependent on it for autocompletion. It feels so good just hitting TAB and moving on.
I know some developers that aren't embracing it the same way I do, making judgements without even trying it. "This is the future", I tell them, "it makes your life much easier", but there's resistance.
Prompt engineering is quite interesting too, and it may turn into a job skill later. While using Codex, I understood the importance of knowing how to ask for the right things to a non-human. Is bit like talking to Alexa in the early days, in the sense that I couldn't talk to Alexa like a human yet, I had to be specific, clear and intentional. I still see that people who are less experienced with a smart personal assistant struggle to get their commands done.
If you love this technology and would love to try it for Explaining Code in your browser, check out the extension ExplainDev (https://explain.dev). It works on GitHub, StackOverflow and documentation websites.
Disclaimer: I built ExplainDev, with the help of Copilot.
Looks interesting. I signed up and received an email with the chrome store link, but the email didn't include an access code (it mentions it, "...and the access key below.", but the only thing below is the store link). Is this a bug or am I missing something?
EDIT: A second email showed up about 30min later that did contain the code.
Yes, we will land on Firefox very soon, hopefully next month.
Copilot's been very useful. The extension is built on TS and lots of custom CSS. Codex is knowledgeable with browser extension APIs and it's helped me to write most CSS utility classes that change sizes, margins and paddings, so that I don't have to bundle the extension with another third-party library.
I've written extensions before and Firefox has a very good polyfill [0] that makes it quite easy to write extensions for all browsers. It does get a bit trickier if you also want to incorporate TypeScript [1] or React however.
Nope it's the Indian govt. Says so in the reason why it's blocked too. Not sure what archive.is did to piss our govt to get banned like this. Could have cached a torrent or porn site maybe.
We tried using OpenAI/Davinci for SQL query authoring, but it quickly became obvious that we are still really far from something the business could find value in. The state of the art as described below is nowhere near where we would need it to be:
To be clear, we haven't tried this on actual source code (i.e. procedural concerns), so I feel like this is a slightly different battle.
The biggest challenge I see is that the queries we would need the most assistance with are the same ones that are the rarest to come by in terms of training data. They are also incredibly specific in the edge cases, many time requiring subjective evaluation criteria to produce an acceptable outcome (i.e. recursive query vs 5k lines of unrolled garbage).
While others here have touched on the idea that Codex has changed their coding habits, what I find interesting is that Codex has changed how I write code altogether. For example, I had to connect a database to an API a little while ago. Obviously I had the option to use an ORM as one would normally. But instead, I just wrote out all the SQL commands and wrapper functions in one big file. Since it was all tedious and predictable, Codex helped me to write it in just a few minutes, and I didn't need to muck around with some complex ORM. These are the tradeoffs I'm personally excited about.
Until the Copilot product is accessible to the public on the same terms as the free software on which it is based, it is another example of corporate exploitation of the commons, or, in other words, open theft.
If it separately becomes a long term trend for private companies to use neural net regurgitation to allow them to use free software without complying with the GPL, free software must be completely abandoned.
The logical conclusion of your position is a de facto situation in which only private corporations get to enjoy intellectual property rights. It’s easy to steal from free software developers but if you steal anything from a corporation your life will be ruined.
Copilot is wonderful, when you use it appropriately. It autocompletes and has a lot of good knowledge of the code.
I had interview that usually lasts an hour and people don't get to do all the code, I did it in 15 minutes or so mostly because I knew what needs to happen and code was autocompleted quickly as I would start writing.
It really helps you focus on what you want to do and don't have to think about syntax or correct variable name.
You still need to be a good developer, but it helps you greatly to do your work.
Yeah, so they came up with 5 point problem that get progressively more difficult. I heard from recruiter that previous candidates didn't do more than 2 or 3 of those.
He did notice that my autocomplete is somewhat magical. I fully explained him what this is, I believe he didn't completely understand and kept talking about some other plugin which I think is some kind of intelisense.
Like I said, even with me being chatty (I go on tangent often), I was done with all 5 in record time, managed to do refactoring of last 2, all thanks to Copilot. I don't think in any point Copilot did something I didn't intend, but it did complement me fantastically.
I wonder what the copyright status of things written by Copilot is? Since a human didn't write some of the code produced, does that mean that those portions aren't copyrightable?
Related article "If Software is My Copilot, Who Programmed My Software?"
I don't wanna jinx it because it's certainly a useful tool but I wonder what will happen when Copilot is widely available and CS students start handing in their programming assignments done by Copilot (and only Copilot).
Easy to check, just put in the first few words and if copilot autocomplete is identical to the remaining solution you just caught a cheater.
Or use names in the assignment that are banned to prevent copilot from working. For example Q_rsqrt is banned because copilot had a tendency to just copy paste the original quake source including comments verbatim.
>> Easy to check, just put in the first few words and if copilot autocomplete is identical to the remaining solution you just caught a cheater.
That doesn't work. Copilot is not deterministic. It generates different completions for the same prompt, at random. I don't think you can catch them all, either.
I don't think it's so simple as banning identifiers either. Copilot is capable of adjusting its code to identifiers in the prompt and reusing them correctly. It's quite impressive at that, really.
Just started using w/ clojure & scheme. No where near as useful as popular languages. My thoughts may change as I use it more, but I'll say it's barely better than w/o right now.
What it does offer for me, in practice, is basically "clever autocomplete". Say I wrote a little bit of code:
Then copilot will suggest that the next line is: That's reasonably smart, and it adapts pretty well to whatever code you are currently writing. It makes writing those little bits of slightly repetitive code less annoying, and I like that – it feels like a useful productivity boost in the same way as regular "dumb" autocomplete is.However I'd say too much of the initial coverage was along the lines of "you can just write the function name and it will fill in the body!". That will work if you want to write `isEven` or `capitalise` or something, which again is quite nice. But I have found that basically any other prompts product a function that either does something completely wrong, or introduces a subtle and non-obvious bug. And this actually feels kind of dangerous – I'd absolutely caught myself a few times from just going "it'll probably be fine" and accepting whatever Copilot has spewed out.
I'll remain sceptical of it as a system that sucks up and regurgitates vast lakes of open-source code and sends my every keystroke to Microsoft, among other things. But it's definitely something I'd pay for a local version of.