hmmm... For many decades researchers and experts have stated that we really don't understand what the heck is going on inside these artificial neuron networds...
....Sounds like the decades long quest to understand how a network of ArtificialNeurons processes input data into useful output (for example: categorizing, deciding yes or no, or controlling the actuators of robots, etc.) ...has been cracked. That is, through conceptually understanding linguistics and concepts like 'tokens' and 'parsing', the researchers have been able to trace through the steps of what the FFNs are logically doing.
[Regarding other commenters' comments: Two points to keep in mind.... 1) any trained ANN (in machine learning, as well as FFNs) can be reduced to a single mathematical formula to replicate the processing being done by these ArtificialNeuralNets. In other words, after training, the result can be reduced to a single bit of math that alone can be reused to replicate the trained net WITHOUT using the actual ANN anymore - using only the ANN for training in order to derive a definite mathematical result for future use. {See most graduate school level textbooks on machine learning for the details....} 2) ..."sparse" does not equal "unnecessary"; it sounds like what others have suggested, it's like a decision tree rather than complexes of connections between artificial neurons doing heaven only knows what logically....]
Most models have a purely fixed architecture: i.e. if you train three layers of six feed-forward neurons a piece, your model and the training data it learns from will just fit within that architecture to the extent that gradient descent can force it to fit. There is no mechanism to say "oh these neurons will never activate, let's prune them", or "we'd have much better loss if we added a layer here".
In the dark times before PyTorch there was an idea called NEAT: "neuroevolution of augmenting topologies", which tried to use a genetic algorithm (i.e. testing a bunch of slightly modified solutions for loss) to discover both optimal weights and optimal network structure. I don't think this gets used all that often, if at all.
I hear stories every few years about how many models have unused neurons that can be pruned or how hyperparameter selection is a pain in the ass, but nothing about automating the pruning and parameter selection in such a way as to efficiently use the whole model. I'm not sure it's necessary anyway.
DL researcher here, it's also hard I think because experimentally many of us have noted (also some research on this) that there's a critical early phase to learning that conditions the network a certain way, and adding and/or removing layers later seems to actually be quite difficult (removing easier than adding, by far, esp w/ something like variational dropout [yes, i cite the old deep magicks]: https://arxiv.org/abs/1701.05369)
Yes, unfortunately I have yet to beat that technique in wallclock convergence speed vs just using the larger network from the start. :'((((
Whoever figures out how to clearly and effectively do it consistently faster than a 'full network from the start' version will open up an entirely new subfield of neural network training efficiency. :'))))
There's a huge amount of work on model pruning, especially with an eye towards model reduction for on-device deployments. I've done a bunch of work in this space focused on getting speech synthesis working in real time on phones. It works and can be automated.
There's a lot of nuance though. What has typically worked best are variants on iterative magnitude weight pruning rather than activation pruning.
This can often get rid of 90% of the weights with minimal impact on quality. Structured pruning lets you remove blocks of weights while retaining good SIMD utilization... and so on.
Absolutely. Pruning can decrease overall accuracy. The challenge is to prune as aggressively as possible while keeping accuracy as high as possible. It's certainly not an operation that is expected to have no effect.
It should be noted that there are methods for training the network to encourage these prunable neurons, eg sparsity penalties.
If I read the paper correctly, it seems to support the old quip about all AI being decision trees, at least for smaller model sizes.
It also raises an interesting UX question: is there an implicit tradeoff between legibility and power for notations as for different ways of expressing rotation, or is that a consequence of using graphics hardware to implement AI?
Or maybe one should keep track of "neuron activation frequency" during training. This would not a lot of extra parameters, since it's per neuron, not per weight.
Then every epoch, or so, we'd "reinitialize" the dead neurons.
This is similar to K-Means algorithms that reinitialize cluster centers that have very few assigned points.
Perceptrons are, by design, analogous to organic neurones — only analogies, not fantastic models. Likewise the artificial networks are analogies to organic networks.
It's therefore not surprising that the artificial behaves analogously to the organic, but it would be a mistake to assume they reproduce us accurately: GPT-3 is about a thousand times less complex than a human, while trained on more text than any of us could experience in ten millennia and only text. It has no emotions, unless the capacity to simulate affect happened to lead to this as an emergent property by accident (which I can't rule out but don't actually expect to be true).
I find it pretty remarkable how in many visual recognition neural networks (say for the MNIST digits) you see neurons close to the input layer that respond similarly to neurons in the V1 area of the visual cortex.
Sure, they are modeled. One is math, one is atoms.
But look at a single 'real' neuron, it is just calcium ions, electrical potentials, there is no 'emotion'.
Once you can completely model a 'real' neuron (I know there is still some scale to achieving this). Then link together these exactly modeled 'real' neurons. What is to say it is not experiencing 'emotions'. Even though it is silicon.
Humans give themselves too much credit for being special. "I feeeeeel, the computer can't". That is just not understanding yourself.
> But look at a single 'real' neuron, it is just calcium ions, electrical potentials, there is no 'emotion'.
Indeed; this is why I am willing to entertain the possibility that emotions may be present as an emergent property of simulating us. I don't expect it, but I can't rule it out.
You clearly lack understanding here too. Simulating "real" neurons at the scale required to simulate a brain is probably np hard. Even if we wanted to try, we don't have any maps of neuron connectivity with nearly the resolution required to do so.
We do for a at least one classic, small-scale example: C. Elegans. Despite mapping the roughly 300 neurons, the simulation attempts I'm aware of weren't very fruitful.
> with nearly the resolution required to do so.
I agree this may be part of why. Accurate simulation may require replicating subtle behavior outside the neuron body. Further maps or simulation attempts may have since been made, and possibly with better results. Given I don't remember headlines about this, it's likely that any improvements weren't groundbreaking.
I don't know enough about the roles of glia and inter-neuron (not interneuron) behavior to discuss this further beyond wild speculation. Nor does anyone, as far as I know. Gaining that knowledge would probably be necessary to build connectomes with sufficient accuracy for simulation.
I think the bigger problem is we have no idea what the necessary or sufficient requirements are, neither for qualia nor for intelligence. (Not sure why you think it's np rather than just lots?)
With intelligence we can at least tell when we've achieved it, to whatever standard we like.
Emotions could probably be a thing where we can map some internal state to some emotional affect display, eventually; but what about any question of emotional qualia?
AFAICT, we don't even have a good grasp of the question yet. We each have access only the one inside our own head, and minimal capacity to even share what these are like with each other. When did you learn about aphantasia, for example? Is there a word for equivalents for that for other senses besides vision? I can "visualise" sounds and totally override the direction of down coming from my inner ears, but I can't "visualise" smells, and I don't have a non-visually oriented word for the idea, as both "visualise" and "imagine" clearly are and "idea" itself more subtly also is.
"Qualia" are really just a reframing of (an aspect of) consciousness, which has been speculated to be purely epiphenomenal. Maybe we're just along for the ride, and our actions merely happen to mirror our decisions - or the other way around, same difference.
Qualia is how to discuss the problem of consciousness without a pointless discussion about if this is the opposite of unconscious, the opposite of subconscious, something that needs a soul, or any of the other things that go wrong if you don't taboo the "c" word.
We don't know the answers (nor, I assert, the correct questions), though "what even is this?" and "what has it?" were already relevant to animal rights questions (qualia might have started with humans, but there's no reason to assume that) well before current AI, and even if we find a convenient proof that current AI definitely can't and that's fine… some of us want to advance the AI to at least as far as brain uploading where the qualia is the goal, though virtual hells like in the novel Surface Detail seem to me to be an extremely plausible dystopian outcome if uploads ever actually happen.
Do you mean biological neurones or perceptrons? There have been publications about both regarding XOR. If the latter, be aware that this was only about single layers and that perceptrons have an unnecessary restriction on the way they can combine inputs.
No. Sometime last year there was paper measuring inputs and outputs on a real biological neuron and discovered it could do XOR logic.
Can do more logic also, that it could also do XOR was some new breakthrough.
And, not saying Neurons are just logic gates, it's just that they are something well below NP hard.
There are analog fluctuations in potential all along the neuron, and they peak and fire at different thresholds. Really, seems like 'weights' if we want to map terminology.
It does not seem like it will be that long before the human neuron can be 'modeled' accurately enough that quibbling over a few percentage points between the 'model' and 'reality' will not leave much room in-between to find a 'soul' or 'spark', or anything anybody wants to say makes humans special.
> GPT-3 is about a thousand times less complex than a human
Also note that it takes the 'brute force' approach to architecture by using a transformer model, basically learning a connection graph from scratch. If you want that to scale to human complexity in function, you're probably going to have to overshoot in size by an order of magnitude.
GPT-3 is about a thousand times less complex than a human
How did you figure that? How many synapses are dedicated to text processing in a human brain? How much information do those synapses encode compared to the information encoded in GPT-3? How about GPT-4?
Estimates for how much information the brain encodes are several orders of magnitude higher than the biggest llm, to the point where trying to replicate it is pushing the boundaries of computability. The brain is also significantly more adaptable and generalized thanks to neuroplasticity.
I see, this is basically the "AI of the gaps" argument. :P
It is NOT "only scale" at this point. But at the current rate, you will see this soon enough (I just happen to see it already). We'll have some very intelligent-seeming, very useful, but relatively uncreative zombies missing a "spark" (or any willpower, or any real source of what is valuable to them, or any sense of what is aesthetically pleasing or preferable or enjoyable or beautiful, or any consciousness for that matter). It will allow us to redefine what it means to be human. But our distinct human-ness will stand out even more at that point.
Agency is a key part of that spark, but we have done all sorts of research into agency and providing goal based agents into an AI model framework incorporating LLMs as well as other optimizers and solvers I think will provide the majority of that spark. The process of creativity depends on both internal agency and goal setting unmoored by external dictation and semantic synthesis of abductively reasoned concepts, with an aesthetic that feels into the goal based optimizer. These are things that can be simulated to the point that while there may be an uncanny valley somewhere it’ll be close enough to be hard to distinguish.
But I do wonder if the practical utility of such an entity is worth the amount of effort and capital required to build and sustain it. I suspect it’ll be more a novelty than a practical tool.
I would point to the problem that chatbots fail not at having a “spark” but at things ordinary computer software does well. The other day somebody pointed out in an HN conversation that I had gotten the 1984 Superbowl confused with the 1986 Superbowl.
That’s a very human mistake, I’m sure somebody can tell you who played in every Superbowl and what the score was but people do misremember things frequently and we don’t call it a hallucination. (which is a defect in perception)
“Superhuman intelligence” is easy to realize for sports statistics if you do the ontology and data entry work and put the data in a relational or related database.
The thing is that chatbots get 90% accuracy for cases where you can get 99.99% accuracy (sometimes the data entry is wrong) with conventional technology. There is a kind of faith that we can go to 10^17 or 10^30 parameters or something and at some point perfect performance will “emerge” but no I think it is more like it will approach some asymptote, say 95% and you will try harder and harder and it will like pushing a bubble around other a rug. It’s a common situation in failing technology projects, quite well documented in
but boy people are seduced by those situations and have a hard time recognizing that they are in them.
In a certain sense chatbots already have superhuman powers of seduction that, I think, come from not having a “self” which makes mirroring easier to attain. People wouldn’t ordinarily be impressed by a computer program that can sort a list of numbers 90% correctly but give it the ability to apologize and many people will think it is really sincere and think it is really promising, it just needs a few trillion more transistors. (See the story Trurl’s Machine in Stanislaw Lem’s excellent Cyberiad except that machine is belligerent and not agreeable)
Now an obvious path is to have the chatbot turn a question into a SQL query and then feed the results into conversation and that’s a great idea and an active research area, but I’d point out the dialogues between Achilles and the Tortoise in
which people mistakenly think is about symbolic A.I. but that is really about the problems of solving problems where the correct solution has a logical aspect. Even though logic isn’t everything, the formulation of most problems (like “Who won the soccer game at Cornell last night?”) is fundamentally logical and leads you straight to paradoxes that can have you forever pushing a bubble under the rug and thinking “just one more” little hack will fix it…
LLMs are just one tool in a collection. Intelligence is based on many models, not just the language parts of our brain - and I expect AI to incorporate more models in a system approach. Why does it matter if LLMs are able to play chess at a grandmaster level or not? They can delegate the actual chess optimization problem to a chess optimizing program. While it’s interesting language alone is as powerful as it is, it’s very myopic to judge the tool alone and not as a part of a toolbox.
Exactly
It is NOT all about LLM's.
There are a lot of other successful models.
From AlphaGo, to visions systems, robotics.
LLM is just the latest shiny thing.
At some point they will all be tied together, and at that point it will start to look a lot more like sections of our brain, one is vision, one is language, one is movement. etc....
I think it's already been made clear that the main reason for the "asymptote" is wrong data input. These models attempt to learn from random internet text ... and this turns out to not be all that accurate.
Also, I've observed a model I was training having the same problem as I do myself. If I at any point learn wrong data, which happens of course, then getting that wrong data back out is very hard and requires 10 or 50 times the effort I spent learning the initial data. In fact I strongly suspect I never unlearn bad data, I just additionally learn "if I say X, it's wrong, say Y instead".
Brains suck for exact work such as database work or precise calculations over longer chains. But they excel at approximate work, and that's a very useful skill to have as long as if you have to you can fire up the pencil and paper and do your precise calculations that way. And paper works fine for database work as well and will remember all of those sports stats for as long as you care (and even after you're dead).
Brains are so powerful because they are universal, they can use auxiliary data stores and co-processors just fine.
So basically we have to give the LLM access to (both read and add to) a tool that can deal with structured knowledge/state strictly, same thing we have to do for humans- like calculators, databases, clocks/alarms, computer language executors… That way if we tell it “remember that my birthday is April 5” it can enter it into a calendar tool in such a way that it can quickly retrieve it later to confirm its “LLM guesswork” or get triggered a reminder of it on that date.
I’ve been experimenting with prompting to get GPT4 to realize it has a “memory” (just a flat file for now) which it can contextually retrieve and write to, coupled with a process that interprets any requests it makes of this “memory” and adds them to the conversation. Limited success so far. End goal is a “life agent” that does things like remind me of things in a human-like way, sums up my emails, etc.
~jgroch