Clueless ML researchers claim to read this and that from brains. Do they know or care that muscle and eye artifacts overlap most of the EEG frequency range? Do they realize that skin conductivity changes when people react to events? It's often easier for ML model to lean from side channels and skip brain waves altogether.
ML can learn to move cursor on the screen by measuring unconscious muscle tension or tracking eye movements trough scalp electrodes.
ML is great for EEG signal analysis, but you do have to know what you are doing.
> It's often easier for ML model to lean from side channels and skip brain waves altogether.
My favorite example of this is a machine learning model for visually recognizing cancers that seemed to do extremely well... Until they realized it had actually learned that cancerous samples were more likely to have a measuring stick in the photo.
In theory ML or simpler statistical models like ICA should be able to separate those signals from each other (if they are separable) given the training data contains measurements from all scenarios (moving eyes, muscles, sweating, reacting to different stimuli etc.).
There are multiple techniques to detect and remove surface EEG artifacts, both physiological and non-physiological. My point is that some ML researchers don't even try, or seem to be unaware.
You say 'claim' but then we have clear cases where these techniques work, allowing paralyzed people to control a mouse or play a video game through scalp eeg sensors.
If you quote more than one word, it becomes clear that GP is not disagreeing with you:
> Clueless ML researchers claim to read this and that from brains. [...] ML is great for EEG signal analysis, but you do have to know what you are doing.
1) If you know what you are doing, it's possible to use ML algorithms to control cursor etc. from EEG coming from brain.
2) It's also possible for ML algorithms to enable paralyzed person (neck down) to to control a mouse or play a video game through scalp eeg sensors using eye and muscle artefacts.
Both are good, but the latter is clueless and would work better with better positioning of electrodes. Track eye and muscle tension directly. Paralyzed people care only that it works.
Academic machine learning is a paper mill. Sometimes there is a rare success like the transformer model and everyone converges on that.
I think the comments here are arguing past each other. If a machine learning model can correctly classify $attribute by photos in 70% of the cases, that is an academic result.
Should it used for the ever popular pre-crime detection scams or for misclassifying feminine looking men as homosexuals? No, that is nasty and is forbidden in the EU. At least for companies, the security state of course does whatever it wants without any repercussions.
Should it be used in principle? No, classifying humans by a machine algorithm is dehumanizing.
Should research be done at all? If the subjects are volunteers and it is confined to a university, it should be allowed. But researchers should not be surprised it they are confronted with obvious parallels to phrenology. This literally is phrenology.
I just want to push back on Academic Machine Learning is a (low quality with no novelty) paper mill and devaluing researchers efforts.
To be clear ML research has “paper mill” problems but we should be careful that we don’t imply that there are only “rare successes”
There are many many amazing results published at ICLR, NeurIPs, ICML every year that are important developments that are not only research successes but also commercial and open source success stories. For example LoRA and DPO are two recent incredible development and these are not “rare” - this years ICLR had many promising results that will in turn be built on to produce the next “transformer” level development. Without this work there are no transformers.
Even transformers themselves were a contribution whose impact only became valuable through the work of many researches improving the architecture and finding applications (for example LLMs were not a given use case of transformers until additional researches put the work in to develop them)
> Should it be used in principle? No, classifying humans by a machine algorithm is dehumanizing.
I see utility in a machine that can help me notice when I'm having a bad day (or a very bad day ie. when having a stroke) or ... help me figure out what the fuck is going on with my totally hypothetical mental illness.
I see plenty to fear when other people deploy AI to “determine” and judge my mood and my non-hypothetical mental illnesses without my knowledge or consent.
Not unreasonably so! I also like my privacy (or what's left of it) ... but when (if? no, unfortunately most likely when, and maybe even "soon") it goes at least I want to have some benefit for myself.
Arguably the current generation of school-age kids (genZ? genA?) already lost it. Because even if they don't have a phone, others do. (Though hopefully this trend can be reversed. Classic bullying at school is bad enough now with cyberbullying becoming the "norm" things are definitely not looking great.)
> But researchers should not be surprised it they are confronted with obvious parallels to phrenology. This literally is phrenology
Phrenology was classified a pseudoscience not because it could lead to socially bad outcomes but because it didn't work; it had no statistical/empirical grounding. If it turns out deep learning really can reliably predict things about people's personality from their faces, that doesn't make it a pseudoscience.
Agreed, but I think the parallel with phrenology should be undestood more along the lines of "providing a justification for unfair decisions".
I remember a case where Amazon used a resume filtering bot that systematically rejected female candidates, because of bias in the training data. So we might go from "you can't be free because your skull is too small" to "you can't get the job because the computer says so".
> Agreed, but I think the parallel with phrenology should be undestood more along the lines of "providing a justification for unfair decisions".
I think we cannot discount the degree to which ideology played a part BOTH in the promotion AND rejection of phrenology. And when the racist ideologies eventually became taboo, anti racist ideology won.
If we consider the actual science, it was probably highly tainted by a desire to show that certain human lineages were superior to others.
But there DOES appear to be a correlation between brain volume and IQ. When controlling for "race", this correlation is typically reported at 0.3-0.4, meaning brain volume accounts for 9-16% of the variance.
However, if we reject "race" as a social construct, and include people of all "races" in our analysis, the correlation goes up to about 0.6, or 36% of the variance [1].
> If it turns out deep learning really can reliably predict things about people's personality from their faces, that doesn't make it a pseudoscience.
What makes it pseudoscience is that it's not theory-driven. These are statistical models that recapitulate distributions in their training data. It's Brian-Wansink-style p-hacking [1] at a massive scale.
To add to your point - you definitely can make predictions about aspects of someone’s personality by looking at their face with the expectation that you’ll do better than chance.
The paper calls out inferring political leanings as an example of pseudoscience. Give me an American’s age, gender, and race (which I can roughly identify by looking) and I’ll tell you, better than chance, whether they’re a trump supporter.
> [problematic papers] including Alam et al.,5
Chandraprabha et al.,6
Hashemi and Hall,7
Kabir et al.,8
Kachur et al.,9
Kosinski et al.,10
Mindoro et al.,11
Parde et al.,12
Man, the author did not beat around the bush, they're straight up naming names! Props to them
A bit of an odd paper. I kept expecting it to reveal specific methodology problems, but they hardly do. Instead they accept that the ML papers are done correctly and find real correlations in the data. Their criticisms are things like "correlation is not causation", although the original papers didn't tend to argue that, or "what even is crime anyway?" (post-modernism). The closest to a solid criticism is one where they argue that facial feature differences were actually proxies for head tilt direction, which is fine, but hardly justifies a paper as grandiloquent as this one is.
Methodological issues are usually good issues, because they are easy to spot. Here the issues are more about the data itself. If you use perfect methodology and put garbage in, you still get garbage out.
Politicians often make the mistake of assuming that if they do something in order to get X, the outcome will be X. Researchers who are more familiar with the methodology than the subject matter often make similar mistakes. The data that is supposed to measure X never actually measures X. It measures something related but subtly different. If you want to make conclusions from the data, you need to understand those differences. You need to understand what the data is exactly and the process how it was collected. Including the details you think are irrelevant but aren't. The last part is particularly problematic for people who are not subject matter experts.
"What even is crime anyway" is not just froo-froo postmodernism. It is an extremely critical question when constructing or evaluating these sorts of systems and the absence of at least some discussion of it in papers building these systems is a clear epistemic problem, especially in a world where the authors of these systems are claiming that ML-based approaches achieve a goal of reducing bias.
Crime is defined as a violation of specific laws of a specific country, province or municipality. What is the epistemic problem you think needs to be solved here?
Agreed the prose was a bit over the top, but I expect such is the expectation in the authors’ circles. Much was general, but I did appreciate the simple reasoning on labeling inaccuracy from just scraping the web for pictures of individuals with autism spectrum disorders and mostly getting pictures of those who also had chromosomal disorders.
On a tangent, a (maybe) silly question: Are there any examples of _deep_ insights derived via machine learning? (most of what I see is just superficial patterns, correlations, ...)
Well, depending on how deep you think winning at Go is, there are, yes.
But probably we underestimate the value of our "contemporary contextual richness", ie. relationships/correspondences that are not apparent (not yet known) yet turn out to be important and valuable and easy to comprehend are mostly only possible because we spend our life (mostly pretty successfully) in this extremely complex and ever-changing environment.
AI/ML/LMMs first would need to get up to speed, I guess, to be able to have these insights and be able to provide them at the right time. (Otherwise ... it's probably already in the training data. Or not that deep. Or too deep.)
- these days, everything is called machine learning... AlphaGo is a great AI achievement, but I don't really consider it ML. it's classic AI augmented with NNs iirc. However I'm willing to concede that it's (a|my) taxonomy issue.
- however, being on the receiving end of Stockfish and friends (chess engines) I see "just moves, no insights". Even, for example, the insight that pushing the h-pawn is often better than previously perceived was caused by humans reverse engineering the results.
Hm, okay, what do you consider ML then? (And what AI, and how much is the overlap?) For me AlphaGo is more ML than AI. (Exactly because of the "moves no insight" that you mention.)
>Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of statistical algorithms that can learn from data and generalize to unseen data and thus perform tasks without explicit instructions.
So it's about learning via statistics & data. If you don't use data or statistics it's not ML.
Chess engines definitely do not fit there. They descend the game tree and evaluate it with an algorithm like minimax. No statistics. No learning. No training (although the static evaluation function might have been trained/tweaked via NNs). I don't know the details of AlphaGo, but I'm guessing it's similar: the concept of the game is hardcoded (game tree,....) while the evaluation of a position is done via NN. the training can be done via games against itself.
I suspect few comments are missing the more subdle nature of the author's argument, which is buried in the rather complicated jargon of the field.
The authors are NOT arguing that ML models cannot work in science, that the examples they mention for pseudoscience have fundamental methodological problems, nor even that classifying people based on ML is incorrect because correlation is not causation.
Rather, they are warning the community about a problem with the epistemic value of results from applied ML, and argue that the problem is a cultural one. In other words, how do we _know_ that ML results are valid science? People working in ML itself are well aware of the garbage-in garbage-out problem of these models, but when these methods leave the field of machine learning and are applied in other sciences the experts in those fields do not know enough about these issues. So, because of the great success of ML models in the past, results from ML classifiers are taken as objective grounds to support dubious hypotheses.
Part of their argument is that because extrapolation from data is seen as having little or even no bias, and because experts of most fields are not also ML experts, such publications with dubious hypotheses bypass the self-regulating mechanism of science, whereby bullshit research is called-out during review.
And this problem, they also mention, is aggravated by the fact that data-driven methods can produce results faster than classical theory-based method hooking into bigger problem of academia that the number of publications is considered a proxy for success.
So given the above I think it is easy to see that this could become dangerous since today science is considered a reliable tool to inform policymaking and political decisions at large. The authors use phrenology as an example to discuss the ethical implication of the issue because it is the most blatant example of socially corrosive pseudoresearch.
There's a tendency in academia (and frankly society, and particularly in this paper) to say that X is bad and wrong and justify that it is bad by demonstrating that it is wrong, but this is a unsustainable.
What happens if X is bad but right? If you need X to be wrong for the sake of your system of morality then you twist and frame and p-hack and don't publish if you get contrary evidence and in the limit you have destroyed the trustworthiness of your own institution and quite frankly this is where we sit today.
The only principled and sustainable position is that X is bad regardless of whether or not it's true. Then at least you're never in a position where you must live by lies.
Some X are only bad because they are wrong. Some X would lead to a complete overhaul of morality if they where right (like this one we are discussing). And some X are bad no matter what.
On this specific case, if it happened that some physical feature of people predicted violent behavior or whatever, we would probably have to push "violence treatment" into a medicine area. So the fact that it is wrong is extremely important to how we react to it.
I appreciate this very reasoned response and I think we are on the same page. I am saying that I think people are in general quite unable to think or talk clearly about the truthfulness of statements where if they were true it would upend their morality.
The solution to this problem is either moral flexibility (which I do not suggest) OR a moral stance which is not dependent on the material facts.
We hold these truths to be self-evident: that all men are created equal.
We have known for centuries that we are all born with propensities towards this and that, but that we are also able to make choices.
The fact that we can now discover the genes behind a particular propensity does not make the actions you take any or more or less of a personal responsibility than they were a century ago.
There are good arguments on either side of the debate but discovering the underlying physical mechanism doesn't make any difference.
Even if we discover, say, a violence gene, we can't usefully test for it because it may be present and inactive or there may be other violence genes as yet undiscovered. It doesn't tell you anything at the individual level.
Genetic fatalism is currently fashionable but it is not a fact.
You are just repeating that X is false. Everybody knows it's false, except some racist people irrationally trying to justify their biases.
That doesn't change the fact that if it was true, the morality around it would have complete different dimensions, and none of what is talked about it would make any sense.
I’m assuming by “wrong” you mean “false” and not “morally objectionable” therefore I must disagree.
It is an *unsustainable* position to hold that something is bad regardless of whether or not it is true. Basing your morality on potentially false information is a guarantee for conflict.
Likewise, if your morality never changes, especially when you learn and grow from experience, you’re stunting your growth as an individual and causing problems for society as a whole. Imagine a child refusing to update their understanding of morality as they age.
This is quite a long paper, and its difficult without spending considerable time on working out what the core argument is beyond the clearly obvious
"We don't like research that claims ML seems able to classify people according to identity groups from photos".
Now, that is obviously a controversial area with huge implications ethically and politically and the clear potential for abuse of such tools. A well-argued check on the strength of such claims would be welcome.
But throwing a load of terms more contemporarily associated with culture-war diatribes like "physiognomy", "phrenology" and "nazi" etc. into what should presumably be a calm scientific analysis is not a persuasive call to invest the time in working out what the actual argument is within the paper.
I think the paper would be more effective if considerably less verbose, with a more balanced tone and a clearer exposition of its core argument.
throwing a load of terms more contemporarily associated with culture-war diatribes like "physiognomy", "phrenology" and "nazi"
These are pretty regular terms describing well-known and discredited fields and ideologies and I don't see them being "thrown around" in the paper. Maybe the paper is bad or unclear in other ways but this is a strange demand for "balance". It is not culture war to say 'phrenologist' or 'physiognomy' when talking about these actual things.
strongly agree. One of the foundational claims, “No inference is theory neutral” is taken as fact based on a single citation from a paper by a philosophy grad student - that doesn’t make it wrong necessarily, but its interesting that none of that author’s many cmu computer science colleagues would co-author the paper.
It is very common to have co-authors solely at other institutions. It does not imply that the first author asked for (or even wanted) co-authors at their own university, muchless that others rejected the opportunity.
> terms more contemporarily associated with culture-war diatribes like "physiognomy", "phrenology" and "nazi"
I challenge that the first two terms are characteristically associated with culture-war diatribes — they are associated with the claimed ability to determine personal characteristics from the shape of one's head or face, which is what this paper is about. As to the third one, it is used only three times in its literal historical meaning to cite the Nazi regime as a well known example of misuse of pseudoscience by the state.
Scientific writing is bad so much of the time. The bio sciences are particular offenders in my experience.
Good writing should communicate a concept clearly. For some reason, science writers feel the need to swallow and partially regurgitate the dictionary. “Look how clever I am: therefore my conclusions are valid”.
I’ve spent ten minutes reading, and I still can’t figure out whether they’re saying that physiognomy should not be explored or does not work. They might have a good point but by christ are they asking the reader to work to dig it out.
I strongly disagree with your statement. in fact the two sentences you quoted are quite clear. They aren’t written for a lay audience, true, but you don’t expect such in Cell. They don’t use any jargon at all.
Your own comment is also clear, but you wear your allegiance on your sleeve. The two sentences you quote make it clear that the paper is about people not technology — and wouldn’t even call the paper itself “AI skeptical”.
You may well be right that “AI skeptics will wildly cite this paper on social media”. But that is orthogonal to the paper itself.
It is a fairly well writen paper. Every field has its jargon, which to anyone who is unfamiliar with it, may sound like word salad.
You could say the same thing about most ML paper for example.
There is a big difference between jargon and, in general - presenting things using unnecessarily complicated language. Give this abstract to any editor, and I reassure you - they will rewrite it.
Even a simple GPT4o prompt ("Rewrite the following abstract for grammar, clarity, and readability.") gives
> This perspective highlights how epistemically unfounded and ethically harmful paradigms are reintroduced into scientific literature through machine learning (ML) and examines the connections between these two areas of failure. We use the resurgence of physiognomic methods, enabled by ML, as a case study to illustrate the damaging effects of ML-promoted junk science. We summarize and analyze several studies, focusing on how flawed research can lead to social harm. Additionally, we investigate various factors contributing to poor practices in applied ML. Finally, we provide resources on best research practices for developers and practitioners.
and for Claude 3.5 Sonnet:
> This perspective examines how scientifically unfounded and ethically problematic paradigms are reintroduced into scientific literature through machine learning (ML) techniques. We explore the connections between these two aspects of failure. Using the resurgence of physiognomic methods facilitated by ML as a case study, we demonstrate the harmful consequences of ML-legitimized pseudoscience. We provide a summary and analysis of several such studies, focusing on how unsound research can lead to social harm. Additionally, we investigate various factors contributing to poor practices in applied ML. Finally, we offer resources and recommendations for best research practices to guide developers and practitioners in the field.
Both versions are just an iteration; a human editor could do it better.
For context, I authored quite a few papers in quantum information theory. It takes work to present work in the simplest form possible.
For context, you have a consulting business, and spoke at a conference about AI in administering justice organized under auspices of recently ousted illiberal government of Poland.
Here's an abstract to one of your "papers": "The paper presents selected tools, as described by their developers. The list includes Hello Quantum, Hello Qiskit, Particle in a Box, Psi and Delta, QPlayLearn, Virtual Lab by Quantum Flytrap, Quantum Odyssey..."
Et patati et patata - no way to tell what you're about with your review other than enumeration and platitudes. Yet you criticize authors of a review paper (detail revealed by immediate next sentence, one you curiously decided to omit from quoting) that clearly state their angle first and at least have something to say. It's just not aligned with your opinions, nothing to do with writing.
You by the way demand "results" of a review paper. What were the research results of you playing quantum web games?
I sense personal animosity and almost feel honored that it is behind a throwaway account.
The talk is publicly available (in Polish) at https://www.youtube.com/watch?v=ChEsmwe7YN0. Which part of it do you find objectionable? In particular, a large portion was directly related to biases, limitations, dangers, and general misconceptions. I'm not sure how this accounts for lobbying rather than educating. And as anyone who has ever seen my social media, I am open with my liberal views.
Ad paper - it is an interesting pick. First, what is particularly unclear or confusing in the abstract? Second, right now, it is one of the most cited in quantum education. You are invited to see research publications on quantum information - rather than, as in this case, education & software.
To use the authors own framework the epistemic failure of this paper is the mistake of identifying the failures of some individuals with a failure of an institution or a technology. The fact that harm can be created by some individuals means nothing more than that. There are pseudoscientists in ML, but to generalise that fact into an argument that there is something rotten at the heart of ML is just foolish. The authors also fail to understand that the idea of a causal theory as fundamental to science is extremely tenuous and has really only applied to physics for most of the history of science. The workings of plants (for example) were understood with an almost complete ignorance of the mechanisms that caused them to behave in particular ways until relatively recently. This didn't impair the value of this observational and contingent knowledge. The comprehension of gravity as a field determining the inverse square law was so brilliant and important that it's blinded us to the reality of so many other fields of knowledge - and their legitimacy.
But the authors do point out what are intrinsic biases and failures of experimental design in most of the examples they mention
* Inferring sexual orientation: Linking «self-reported sexual orientation labels» with «[...]scraped their data from social media profiles, claiming that training their classifiers on “self-taken, easily accessible digital facial images increases the ecological validity of our results.”[...]». Social media profile photos are by their very nature socially influenced, with open sexual orientation being an important cue to display.
* Personality psychology: Training and test datasets came from the same pool of «participants [who] self-reported personality characteristics by completing an online questionnaire and then uploaded several photographs». This heavily suggests that the participants were aware when choosing the photos that this was a "personality type" experiment, and may even have made their own awareness of their personality more salient by doing the test first and then uploading the photographs.
* “Abnormality” classification: General critique of lack of transparency as to how the true labels were determined.
* Lie detection: The ability to detect the facial differences between people following two different experimental instructions does not equate to lie detection.
* Criminality detection: At least they used official ID photographs instead of self-selection-biased photos like the first example... but consider this: what conclusions would their same model reach if it used official ID photos of US populations? The confounding factors of class and ethnicity are obvious.
These are examples and the experimental designs were a particular choice by the authors of those examples - they aren't intrinsic to ML or the ML community.
Hence why the authors never claim to be talking about ML or the ML community. They are talking about "the harmful repercussions of ML-laundered junk science" and, in the section that I quoted in my comment above, they "review the details of several representative examples of physiognomic ML".
I didn't read the paper particularly thoroughly, but there is a real threat here. Governments love to "follow the science" when implementing authoritarian policies, and if there is a body of psudoscientific literature law enforcement will with high likelihood use it as cover for time-honoured traditions of eyeballing people and judging them based on how they look.
It is important to resist that dynamic at every level, so it is probably worth supporting the papers authors in pointing it out. The risk of pseudoscience taking on a racial tinge and leaking out into the real world is always present.
Clueless ML researchers claim to read this and that from brains. Do they know or care that muscle and eye artifacts overlap most of the EEG frequency range? Do they realize that skin conductivity changes when people react to events? It's often easier for ML model to lean from side channels and skip brain waves altogether. ML can learn to move cursor on the screen by measuring unconscious muscle tension or tracking eye movements trough scalp electrodes.
ML is great for EEG signal analysis, but you do have to know what you are doing.