In computer science, researchers who had spent decades working on feature engineering for images got completely blown out of the water by deep learning. All those decades of papers would never get another cite. Everyone moved on.
String theory doesn't actually do anything though, so there's no way to say it's better or worse than anything else. It's like someone saying they're working on solving the halting problem or something and eventually they'll get there and they've been working on it for 20 years without any code that actually works. At some point, you just have to give up and do something else. They keep getting grants to work on this stuff though for some reason. Someone needs to interview the people who are giving out string theory grants and ask them why the heck they are still giving these people money.
There might still be some value in it. I think people will use tools like AdS/CFT correspondence for a while. And it is worth remembering that in terms of actual dollars, fundamental physics research is not that big of a spend. Like, part of the reason why these articles are important, is that the pie is so small to start with. So I don't worry too much about the string theory grant money, if anything I would not mind the absolute dollar amount going up if the relative fraction decreased with it.
The one thing that kind of irks me is this Kaku book “The God Equation.” I have only seen one equation that is so universally applicable that it could deserve that title, and even then I would be hesitant about that because it might give people the wrong impression. (It is the transport equation—it keeps appearing and appearing, a bunch of other equations are special cases of it, it is involved in one of the million-dollar Clay Mathematics prizes so there is clearly something hard/intractable about it, and it has a term which refers to creation and destruction. It says, a box flows downstream, the time rate of change of stuff in the box is equal to the flow J of stuff through the walls of the box plus the rate Φ of stuff being created/destroyed in the fluid. Or, ∂ρ/∂t + (v · ∇) ρ = - ∇ · J + Φ.)
> for images got completely blown out of the water by deep learning
Deep learning might be a good example. The underlying neural networks has been a “dormant science” for decades until its breakthrough.
I wouldn’t be so sure either that image feature processing won’t get another cite. More likely they will show up in preprocessing again as the boundaries of deep learning get pushed.
The thing we see with image features is that early layers of a convnet learn almost the same thing that specific engineered features that we used to implement, but better - the earlier feature engineering used "clean" human-designed structures, but DL automatically got to the same point and also tweaked the coefficients to be slightly better, and once you can do that, there's no reason to go back to what we manually engineered. For example, why would you explicitly use a Sobel edge detector in preprocessing if you could use (for that same preprocessing) a set of convnet kernels that include (among other things) edge detector kernels that are very similar to Sobel but slightly better?
So no, I would not expect the currently known engineered image features to show up in preprocessing again - they simply do not add any value whatsoever (which can be experimentally verified) over the "learned preprocessing features". Perhaps someone will engineer a new substantially different type of feature that can't be currently learned from data, and that would get used and cited - but it won't bring citations to the old engineered features.
So if you were to write an OCR software today, you would just feed a neural network with a whole raw phone image shot of a floppy paper? Or would you try to flatten the image first, then extract words and letters and then synthesize that back to a bigger model (also using NN) that finally produces your text?
This seems a quite orthogonal issue to what I was talking about - your comment seems to be about the separation of pipeline in subtasks, while I was talking about methods used for particular subtasks; I would not consider "try to flatten the image first" as part of feature engineering, I would consider it as a preprocessing task that might be done as an engineered feature, but also might be done as a machine-learned subtask, or skipped entirely, or integrated in a more general transformation that's believed to inherently correct for non-flat images.
I don't work on OCR much (and when I did, it was for book digitization which doesn't have these specific challenges), so I don't have an opinion on what's the state of art for a task like you describe (I'm imagining analysis of receipts or something like that), however, across many domains of ML (incliding but not limited to computer vision) we are seeing the advantages of end-to-end optimization.
So, for example, the image preprocessing we would like to allow for input of raw phone camera images includes correction for lighting, angle, and crumpledness of paper. Obviously, I agree that these things should be done, but I do not necessarily agree that they must be done as separate engineered features.
I don't have an opinion on what's the best option "today in production" for OCR - it's plausible that the engineered feature way is still the best at the moment, but if we're looking at where the field is moving to, then I'd argue that there is a strong tendencytowards (a) using numerically optimizable methods for these corrections as opposed to hand-selected heuristics; (b) optimizing these corrections for a final OCR result as opposed to treating it as a separate task with separate metrics to optimize; and eventually (c) integrating them in an end-to-end system instead of a clearly separable stage of "correcting for X". I'm not certain where the state of art for noisy OCR is today on this, but it's a one-way direction; The key point of my comment above is that once (if!) we get these things to work I would not expect to go backwards to specific handcrafted features ever. It's plausible that some tasks are better treated as separate and can't be integrated well (perhaps the selection of text segments in your OCR example is like that), but for the features that we already have managed to successfully integrate (which was the topic discussed in the grandparent comment), IMHO there is no reason to ever go back.
Perhaps a relevant example of a field that has undergone this transformation is machine translation - where just a few years ago production pipelines included literally hundreds of separate subsystems for niche subtasks conceptually similar to those you mention regarding OCR, the shift to neural ML was accompanied by making these subsystems redundant as doing the same thing in an integrated end-to-end optimizable way gets better results and is simpler to implement, maintain and debug due to a more unified architecture.
Similar trends have also happened for face recognition and for speech recognition - I would presume that OCR is structurally similar enough to see the same fate if it hasn't already.
I’m not so sure; I think a _lot_ of people would be interested in advanced in non-ML image analysis tech. While ML has been effective for a recent period in industry, it has a number of issues such as intensive training costs, extreme difficulty in fully understanding the behavior of a model (since we can only do experimental verification, and only on behaviors we know to be interesting already), and ethical issues such as unintended gender/racial bias. Just off the top of my head.
I think what you are pointing is really the most toxic part of tech: marketers and investors have found that tech is a good way to aggregate money, and so they have a thrown a lot of funding at tech that can aggregate for them. However, we haven’t actually proven that that tech is the best solution or a sustainable solution. We don’t understand most of what we do with computers very well, we just approximate until it works well enough (for the marketers and investors, of course).
ML and deep learning are very valuable, of course, but their recent market dominance doesn’t indicate that they are the final or most correct solution to the problems they are being used for. It indicates that people want to spend money on it right now.
The halting problem is a poor example as there is already a simple proof showing that such a solution does not exist.
A much better example is P=NP. This is a very rich area of research with many closely related problems that has been worked on for decades and will continue to be worked on for the foreseeable future.
A better analogy might be two kids standing at the bottom of a tree arguing about what the fruit tastes like, but they’re too small to jump or climb to get the fruit and find out.
It’s not wrong to want to find out what the fruit tastes like. Both kids have good ideas about what it might taste like, and there are many such plausible ideas. But the tree is tall, and remains out of their reach, and all they can do is speculate. Meanwhile, there is no shortage of other interesting things they could be doing.
That String Theory continues to adhere to known results keeps it in the candidate pool. If it can make the same predictions as the current framework, then it makes sense to keep working on both because neither has a known advantage over the other in predicting unknown phenomena. The only way to find out is to continue exploring all such theories. Since human talent and research is limited, it becomes an economic problem. Is it worth having half of the brain power working on two equally plausible theories? Who knows... but what I do know is that diversity in this field is creating new ideas and new mathematical tools and that has to count for something.
I am just finishing up my MS in Statistics, and at my university they don't have any course(yet) that goes into deep learning. The only time the course briefly talk about neural net is in the context of Logit regression. And I have always wonder if the time spent proving Gauss Markov theorem is better spent somewhere else.
Do you know any source(book or otherwise) that continues to build on top of topics taught in a Regression and Linear algebra course(such as Multiple Linear, Weighted Least square, Logistic, generalized least squares, principle component regression, singular value decomposition) and slowly move towards the current state deep learning?
>And I have always wonder if the time spent proving Gauss Markov theorem is better spent somewhere else.
No.. learning how to prove it will serve you well in the future. All deep learning is is figuring out different ways to combine all of this prior art inside of a computation graph.
Step N is linear regression, SVD etc... step N+1 is deep learning.
I wish I knew what it was like to have the math background, and don't at all mean to suggest one should not bother, but as a person on a team of people who have made a few production NN models over the past year: it is completely possible to conceptually and practically grasp iterated reverse differentiation with convolutions well enough to use deep learning to do novel work, without having the deep background math knowledge. For example, I barely know what a Support Vector Machine is, nor could I do a linear regression without a lot of hand holding. But I can design a passable tensorflow model and improve it.
It would definitely be very hard to do any meaningful research from this position, but I know enough to be useful (er, or dangerous) and can read papers and code to keep up with recent advancements, and try things out like different convolution designs, layer designs, gates, functions, feedback, etc.
(On the topic of reading: Karpathy[1] and Colah[2]'s posts have a wealth of introductory conceptual information and images in them, and helped our team discussions a lot while we've learned.
I think it will vary for person to person - but when I am looking at a new idea(i.e. principle component analysis) that is built on top of old ideas(PCA is based on Singular value decomposition) - familiarity with old ideas makes me more comfortable/confident in thinking that I'll be able to understand the new idea thoroughly and hopefully I'll also see the incoming pitfalls of the new idea. And I really enjoy the process of adding new concept/idea in an existing larger picture.
Sure makes sense. There are multiple layers to deep learning:
The math and proofs.
The algorithms.
The software tools/frameworks
Building NNs with those tools.
All of those have different levels of conceptualization and utility.
As some extreme examples: Anyone can build an iPhone image classifier drag and drop today. That’s one level. Developing an alternate to backprop is another level.
I liked "Hands on Machine Learning" by A. Geron. It's quite practical - you won't find careful progressive proofs, but it is very good at introducing the methods of ML and relating them to real problems. It provides you with a way to get your hands round the techniques - with an MSc in Stats you should then be well placed to understand where the gaps and problems are and to use things wisely.
It isn't quite state of the art, and it is very light in terms of how to evaluate and deliver the solutions. But I thought it was good. My team at work has gone through it selecting one chapter at a time this year in our book club - it's provided a good catch up and share forum for us. Some of the folks in the team are Engineers and so haven't a strong ML background, a couple are Data Scientists and therefore were in the same place as you (+/- some use of deep learning in a few engagements) and a couple of the folks are diehard ML researchers, but focused on different things (time series, NLP, evaluation) so there was a lot of crossover for everyone.
Last time I took a Uni. course, on the side of work, I enrolled in a brand new Deep Learning course. We used Ian Goodfellows book (https://www.deeplearningbook.org/), as well as supplementary research papers.
That's kind of true but also stuff like hessian noise filters are still useful and they existed and when written about prior to the application of neural networks to images.
String theory doesn't actually do anything though, so there's no way to say it's better or worse than anything else. It's like someone saying they're working on solving the halting problem or something and eventually they'll get there and they've been working on it for 20 years without any code that actually works. At some point, you just have to give up and do something else. They keep getting grants to work on this stuff though for some reason. Someone needs to interview the people who are giving out string theory grants and ask them why the heck they are still giving these people money.