Hacker Newsnew | past | comments | ask | show | jobs | submit | kn8a's commentslogin

Are there any alternatives to gradient based learning that could make this less useful? Is there another type of compute unit that is the next evolution of CPU -> GPU -> ?


Based the following constraints that lie at the center of AI and parallelism, I'd say no -- stochastic gradient pursuit using vector processors like GPUs is inescapable in all future AI advances.

1) All AI is based in search (esp. non-convex, where heuristics are insufficient to provide a global convex solution), and thus is inevitably implemented using iteration, driven locally by gradient-pursuit and globally by... ways to efficiently gather information to optimize the loss function that measures how well that info gain is being refined and exploited.

2) Search that is inherently non-convex and inefficient requires as much compute power as possible, i.e. using supercomputers.

3) All supercomputer-based solutions to non-convex problems are implemented iteratively, where results are improved not using closed-form math or complete info, but by incremental optimization of partial results that aggregate with the iterations, like repeated stochastic gradient descent that creates and enhances 'resonant' clusters of 'neurons'.

4) The only form of supercomputing that has proven to scale up at anywhere near indefinitely is data-parallelism (a dataflow-specific form of SIMD) -- where the search space is spread as evenly (and naively) as possible across as many processing elements as possible.

5) Vector processing hardware like GPUs implement data-parallelism as well as any HPC architecture yet devised.

Thus, I believe that AI is stuck with GPUs, or equivalent meshes of vector processors, indefinitely.


Completely agree. I'd only add one point: Quantum computing -- when and if we ever build them with enough error correction to be practical -- can replace GPUs for a few very specific kinds of search where the statistical shotgun approach that quantum computers do so well can be exploited. One canonical example is the quantum FFT (Shor's algorithm) which seems to work quite well for finding the factors of large composite numbers.

Quantum computers are not magic and they are by no means general-purpose, but if your search mechanism matches the kind of search quantum computers are good at, they can greatly exceed the speed of GPUs.


Could something like 'lottery ticket' review approach make this process faster? I might be wrong but I am not sure if that is parallelized.


I'm not sure what you mean by 'lottery ticket' review -- presumably some sort of random global search shortcut, like mutation in genetic algorithms.

If so, then no, using randomness to disrupt search only resets the search to continue at a different (random) spot in the global search space. That kind of approach certainly can escape local minima, but it doesn't diminish the amount of total time/effort required to do (non-convex) global search.

I think the long history of our attempting to use randomness to improve search has shown that all naive approaches inevitably fail unless the number of processors available can be scaled up to 'equal' the number of decisions required to explore the entire search space (as quantum computing seeks to do). But maybe the formal resolution of that question will come only after we prove P != NP.


Sorry I meant the 'lottery ticket' hypothesis. https://arxiv.org/abs/1803.03635v1

The lottery tickets are random but they are basically - as I understand - smaller graphs that have the same strength as a larger graph.


It's a tough question, it's not even back-propagation but even just sometimes the "parameters" of the models, for example [1] shows that models such as ResNeXt already perform better on a very different architecture such as Graphcore, for some sizes of convolutions. Older models, or models that get tuned for existing GPUs, do not perform as well.

It's tough to come up with a new architecture that can have an advantage on current and future models, at least from a peak perf point of view, from a perf/watt for example instead the scaled-up Apple GPUs seem to show new interesting properties. But the Graphcore architecture is quite interesting, being able to act somehow both as a SIMD machine and a task-parallel machine.

[1] https://arxiv.org/pdf/1912.03413v1.pdf


Yes: use actual biology and biological processes. Just like life already does. It doesn't cost millions of dollars and megawatts of power to create and run a dolphin or a crow.


Isn't the problem that we can't hack time? Otherwise we would have a million monkeys on a million typewriters everywhere.


If you have beta access to codex you can try and replicate their results. I have not been able to get the same results out of codex.


Mojang have also cracked down on brands from 2016. "Mojang spokesperson Owen Hill pointed to situations where something like a car company would build or commission an independent developer to build a Minecraft server with specific purpose of advertising a new vehicle." https://venturebeat.com/2016/05/31/microsoft-bans-corporatio...


The Facct (Fairness, Accountability and Transparency) conference ended recently (https://facctconference.org/). it might be a good start to find different research contributions.


I accidentally realised I could stream from Steam the other day which is interesting that they can technically do it.


This was a mind blowing example of math and art from last year - perhaps the live coding of the method was even more impressive than the outcome: https://www.youtube.com/watch?v=8--5LwHRhjk


As soon as I saw the first frame, I knew that was an Inigo Quilez video, very interesting work. His website inspired me to switch to an SDF approach to modeling for 3d printing: https://github.com/deadsy/sdfx.


She did an awesome job. The article implies that the choreography was reviewed by an engineer who would script the moves? Then article says then there is a pipeline that doesn't need scripting? The main question I was wondering is it possible to do this with mocap alone? Surely the closer artists can get to sculpting the tech the more opportunities for creation are possible.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: