Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Thoughts On Machine Learning Accuracy (amazon.com)
168 points by deegles on July 27, 2018 | hide | past | favorite | 93 comments


I'm not only disappointed in the ACLU for misinterpreting the results, but also because I usually have a lot of faith in their competency, and they absolutely should have seen this response coming from Amazon.

What'd they think Amazon was going to do, roll over and be like "turns our our facial recognition software is racist, woops!"? Now instead of a meaningful dialog, we've got a line in the sand where on one side, the ACLU, champion of the people's rights, doesn't understand technology, and on the other side, facial recognition software is Bad and Evil. Shits so polarizing these days it seems there's no room for negotiation, as much as I'd like there to be.


> I'm not only disappointed in the ACLU for misinterpreting the results...

Did they? This blog post describes some implementation choices which could make their false positive rate lower.

It reveals no information about how Rekognition is being deployed by LEAs, and there’s no meaningful regulation or oversight about how that happens.

We can’t tell what the assumptions and implementation choices of local police departments are, because that’s top secret information. Most people acting on the predictions have no idea what they are, let alone the implications.

It’s fine to ding the ACLU study on methods —- you can, because they published some.

No-one would argue that a toy model based on members of Congress is representative of the public at large. But it’s more representative that a press release from a company that are trying to normalize and monetize mass surveillance via facial recognition.


The original ACLU title is: Amazon’s Face Recognition Falsely Matched 28 Members of Congress With Mugshots. It was spread on the news with the same conclusion.

It puts directly the blame on Amazon and its technology and doesn't mention any configuration they used.


Why would a title mention API/service configuration? This is not common practice, even in academic applied ML papers.

They used the default settings, and a training set which seems extremely likely to be used in ‘LEA-production’

Any user, including LEAs, is free to use which ever configuration options they like.

The criticism with the most statistical implications is the confidence level used. The ACLU were clear they they used the defaults. I can take at face value that moving from 80% to 99% confidence on a sample of 500 faces could produce 0 false positives. However, on the faces of the tens/hundreds of thousands of people that might move through the center of town on any given day, the implications for causing ordinary people serious harm are large.

I am personally shocked that Amazon are willing to “recommend” law enforcement actions at any level of confidence, from an unintrospectable machine learning system.


I personally believe that facial recognition software will eventually be employed by law enforcement no matter what we do; I also believe that this is a bad thing and lies on the path to a controlling and pervasive police state.

However, this article states clearly that there are recommendations to LE on how to use Rekognition in a meaningful and correct manner for LE's use case. I am highly confident that Amazon will provide technical documentation and manuals for a contract of that size; of course Amazon will not be able to conduct oversight on LE operations. LE may missuse it, the way they've missused Stingrays, stoplight cameras, clipper chips, and other devices.

You state that having 99% accuracy for identifying an individual will do large scale harm, and I agree; what about the 1 person out of every 100 that is falsly identified? The other side to this coin is the number of people law enforcement would miss recognizing without this software... This argument comes down to human ability vs. machine learning ability, and human discretion in interpreting results. I think with proper use it'll be a benefit to identying culprits quickly.

Regardless, it is still pretty evil giving LE any tool that can autonomously monitor people who have not committed any crimes. But who's going to stop them?


The title is akin to saying "We bought <gun maker>'s hunting rifle, pointed it at Congress, pulled the trigger, and it shot 22 members of Congress". It's a tool, you're responsible for how it is used, not the manufacturer.

I would imagine the confidence level would be tweaked by Law Enforcement or anyone else based on the results. If you're monitoring cameras for missing persons or shooting suspects and getting 0 hits, you might lower it because it's better to dismiss false positives than the alternative. Conversely you might increase it if you're overwhelmed by the number of false positives.


Since you're comparing facial recognition to guns and guns are regulated by law and have strict rules for use, perhaps facial recognition should as well?

I think there should be a discussion about laws/regulations about facial recognition use. Even if Amazon removes their service there will always be companies that don't care about the ethics as much. Strict rules instead of public shaming seems like the better approach


If you screw up with a gun someone dies.

If you screw up with facial recognition a human double-checks and nothing happens.

Scissors are dangerous but we don't regulate those either.


ummm no...

First of it is not a given if you "screw up with a gun someone dies" that is not the only result

Further it is laughable to believe that with facial recognition a "human double-checks" we have seen time and time again that new technology entering the realm of criminal justice can be used to wrongfully convict people at an alarming rate. See Hair and Bite mark controversy or the number of Labs that have been compromised in recent years

The fact is that a "humans" that are suppose to be "double checking" do not in reality, if the computer spits out "Well the computer says it is person X" that is what the Jury will hear, and the Jury will not question the technology and the person will end up in prison until such time as someone like the Innocent project comes along to invalidate either the technology on the whole or the application of the technology


> If you screw up with facial recognition a human double-checks and nothing happens.

You have too much faith in humanity: a lot can be, and will be done "because the system says so". Nevermind integrators who will create processes that assume recognition is 100% correct "Yes, I know you're not the wanted criminal, but now that I have tagged you, I have to take you to the station and process you out...Monday morning because that's when Jim is in"


> The title is akin to saying "We bought <gun maker>'s hunting rifle, pointed it at Congress, pulled the trigger, and it shot 22 members of Congress".

Only if <gun maker> was specifically marketing their hunting rifle to be aimed at members of the public, and for the trigger to be frequently pulled. Happily, gun-makers are far more responsible than that.

> It's a tool, you're responsible for how it is used, not the manufacturer

Not so. Private businesses can refuse to offer services largely as they wish, and AWS frequently do, based on their terms of service.

For example, you are not permitted to run a web-scraper or open mail relay on EC2. You are not permitted to host bestiality on S3. None of these things are illegal. But Amazon have taken the stance that they just don't want them to happen on AWS.

I am of the view that mass facial recognition poses more problems for society than running web-scrapers.

Many civil liberties organizations have serious concerns about the use of facial recognition for law enforcement, including pro-industry organizations such as the EFF[0]. This is not some new reaction; they have had Know Your Customer guidelines about terms for ML for states since 2011[1].

Even competitors who are incentivized to sell facial recognition tech to LEAs are warning that it's a bad idea for society, with serious and likely lethal implications. Here's the CEO of facial recognition startup Kairos[2].

> ...the confidence level would be tweaked by Law Enforcement

This is a large part of the problem. If you don't think so, then we simply have very different visions of the future that we want to live in, and must agree to disagree.

[0] https://www.eff.org/wp/law-enforcement-use-face-recognition

[1] https://www.eff.org/deeplinks/2011/10/it%E2%80%99s-time-know...

[2] https://techcrunch.com/2018/06/25/facial-recognition-softwar...


> Happily, gun-makers are far more responsible than that.

I mean, accurate, but then again they are making guns, so ... low bar i guess.


And when this tool is pointed at a poor black neighborhood, do you think people will question the results just like they did the members of Congress?

To me, this is the issue. We make all sorts of statements when it’s congress, but when it’s real life it’s not enough to make the news.

After all, how many members of Congress have been shot because a police officer thought their cell phone was a gun?

THAT is what the ACLU is trying to point out here.


Taking results of 5% false positives coming from a tool with a 80% confidence level as anything other than "it worked better than expected" is misinterpreting the results.


How many police forces would (without this debate) have ever done more than use the default settings?

BTW . Why is the default 80% Does it deliver 4/5 TP in the wild? Has anyone done an evaluation? What about vs. the kind of corruptions that are common in the wild - how does it handle odd light, reflections, shadows?


Yes, they did. It's great that they explained their config choices, so that folks with domain knowledge can evaluate reliability of their test, and report back.

For those of us without domain knowledge, however, what the ACLU published is wildly misleading, to the point of being actively disingenuous. What are folks without baseline STEM education, much less AI education, supposed to take from the ACLU article other than "Amazon AI is racist and dangerously ineffective"? Do you think they are opening up the Rekognition docs to evaluate if the default config is appropriate for this test?

(By the way, I wouldn't be surprised if Rekognition is actually racially biased. I also feel a bit icky criticizing the ACLU here — they do great work, and I encourage everyone to donate.)


And yet, if the ACLU never ran their article, we wouldn't be having this conversation right now. I say it was a job well done.


The only important part of this conversation is the part that offers a suggestion for the level of confidence you should use when highly accurate matches are important. But we never needed a conversation for that, it's in the documentation (and in the mind of anyone with basic knowledge of statistics, which should include anyone using this statistical tool for anything important).


But a huge part of the problem is that many users of facial recognition software will not be experts on how to properly use the software. If the ACLU improperly used it, do you really trust law enforcement agencies to do a better job?


The local cop will not use AWS APIs by himself just like that, connecting it to the surveillance cameras and mugshot databases.

Hopefully, and I think this scandal will help, the engineering departments/contractors for the government will develop software that is appropriately configured for each case.


I don’t think a cop or juror is going to look at the confidence score and accurately understand the reasonable doubt that a classification is incorrect.


The cops won't just consume the API. Software developed for the specific task will.

What I imagine is this: 1. The cop uploads a picture of the suspect taken from camera surveillance at crime scene 2. Software displays the mugshots (and names) of PROBABLE matches from the database with confidence %, color coding according to confidence level, and notice of proper use 3. Cop compares man/woman's picture with what's displayed 4. Investigation continues if necessary


Face recognition is not being touted being as accurate as DNA testing or fingerprints, so why pretend it is anything but a fuzzy ranking by similarity? Many forensic techniques generate probabilistic results.


Hah. You clearly haven't worked with any government agencies or their contractors. There will be mistakes, and it won't be just a few. The public will have very little action towards even validating whether they're configured properly. It's gonna be a shit show.


What’s an appropriate configuration for your neighborhood?


I would be on Amazon's side if their terms of service clearly stated "do not use this tool for law enforcement". Do the terms of service state that?

If this is used to misidentify someone and then that someone gets shot how will the folks at Amazon feel about that? To me failing to consider that and failing to act appropriately is reckless. It's as reckless as Uber turning off the Lidar and as reckless as Tesla using the road markings as a dominant steering method.

When you have to ask "how many innocent dead will it take?" do we not all think that something might just be going wrong!


> I would be on Amazon's side if their terms of service clearly stated "do not use this tool for law enforcement". Do the terms of service state that?

Why on Earth would they put something like that in their ToS? If anything, the place for that is in documentation ("this product may not be suitable for use in law enforcement"), but then again, since when Amazon does have a clue about the specific requirements of government services, to issue blanket statements like that?

> It's as reckless as Uber turning off the Lidar and as reckless as Tesla using the road markings as a dominant steering method.

Yes, you don't blame the Lidar company for equipping their products with an off switch, nor do you blame whoever designed the road marking detection for Tesla's decision to base their self-driving tech on it.


I think that this illustrates the challenge of this debate. I believe that your perspective is that Amazon are there to make money, provide a component and then the person using that is responsible for the outcomes. However I think that the person providing the service is responsible for who they sell it to and what for. The analogy that I would draw here is with guns; if you sell a gun to a child you should be accountable for that as a misdeed. Note - that's an analogy to try to convey the point I am making, I'm not comparing the misdeeds.


In your example though, is Smith and Wesson responsible for the store selling the gun to a child? (In this specific case yes, because they lobby against common sense gun control)

For law enforcement, I assume they wouldn't be hitting the Amazon API directly, but rather using software someone sold them which as part of its service uses Amazon's API.


> Shits so polarizing these days it seems there's no room for negotiation, as much as I'd like there to be.

Far worse if you're being polarized. Also they ACLU hasn't released their methodology, are you jumping to conclusions with the ACLU "doesn't understand technology"?


> champion of the people's rights,

The former ACLU, yes, but not the current ACLU: they have repeatedly said they dont want to protect Free Speech anymore.

https://reason.com/archives/2018/07/26/liberty-makes-us-unfr...


For those who are emotionally invested in defending Amazon on this, could you speak a little bit to what your horse is in this race?

I have stock in Amazon, but also caution about law enforcement (and a desire for technology to increase accountability of government to citizens rather than vice-versa). For me, the fear of going even one tiny step closer to China outweighs everything else.

For those who are arguing on behalf of Amazon, what's your emotional calculus here?


Not defending anybody, but I think the point in the article is sound. If my child were missing, I would want every technical resource possible to be available to find her. Having "big brother" be able to identify me quickly in a crowd doesn't seem that bad on balance ... if I'm in public, I expect to be viewed.


I think maybe you've never been pulled over by an officer because you 'fit the description of a suspect'

You, as a bystander, can be caused great inconvenience (where the line of injustice gets crossed is up for debate) through someone else's expectation of safety.


You're right, I haven't (I wonder what percent of the population has?). I find it hard to believe that law enforcement using facial recognition would really make that problem much worse than it already is though.


China is already experimenting with pre-empting crime by attempting to monitor their vast network of cameras using ML systems.


Hmm, I don't really follow that calculus.

I guess the way I look at it are the odds of my child being abducted are one in a 300,000. On the other hand, a significant portion of the world (30%?) lives under oppressive regimes. So there's that.

That's such a huge margin that I have trouble understanding any other way of looking at it.


So do you believe that restricting the use of this tech in the US would prevent oppressive regimes from using it? Or even slow them down?


I certainly believe that [it would slow them down]. Firstly, I think the odds of America becoming a repressive regime, though low, are much higher than the 1/300,000 chance of a particular child being abducted... Remember, the US had a criminal president in the 70s.

Secondly, I think allowing this technology in America makes it more tolerated in other countries. Perhaps facial recognition in public should be banned by international treaty.


Why would that be true when the US has actively propped up bad regimes again and again for realpolitik reasons?


You stated the classic and the cliche argument of safety over privacy.


I meant to distinguish "when I'm in public" explicitly; in cases where I have a reasonable expectation of privacy I would not want to give that up.


A single accuracy rate is not that useful when looking at ML problems. There is always a tradeoff between false positives and false negatives depending on the threshold selected. Amazon is guilty of this in this article, saying 0% false positive rate when you set the threshold to 99%. However, it doesn't say how many faces it successfully matched, it's easy to have a 0% false positive rate if you just say "no match" for every photo.

The best practice is to show a ROC curve, which shows the tradeoffs of selecting a given threshold: https://en.wikipedia.org/wiki/Receiver_operating_characteris...


After Pearl Harbor, some wanted to get rid of early-warning radar systems. The military created the ROC to screen for operators who might misidentify enemy planes. The radar did fine, the interpretation was wrong.


Interesting, do you have a source? I'm interested in misjudgment and cognitive errors, so i'd like to read about this.



Assuming that the data set that test and training are drawn from is representative of the data that you see in production.

Which it never is.


Amazon, let’s be honest - nobody reads documentation about how to set a knob to get a good result, not police, aclu, or most developers working for these. They’ll use 80% confidence happily and “sort out things later”.

So cut it out with your justifications about ovens and other nonsense and acknowledge that you are just a corporation releasing a product that tries to make a buck and/or not get left behind some other competitor of yours.


The idea that the best solution to the future challenges of ML is to shame companies into not selling ML products is so inane. Pandora's box was opened the moment the computer was built. And really, the first time a human uses a club to bludgeon another human.


Well, it depends what you are selling - if aircraft manufactures had stuck to selling hydrogen filled and aluminium painted airships I think that air travel would not be the thing it is today. Amazon's technology is not fit for this purpose, an ethical and responsible stance would be not to sell it for this purpose.


I love this quote at the end of the the google cached result - "we should not throw away the oven because the temperature could be set wrong and burn the pizza."


When the oven will burn you if used incorrectly, and Amazon gives police departments (who don't know how ovens work) the controls over the oven regardless of their provable technical ability and with no oversight, we should definitely throw away the oven.


Maybe yes, maybe no. But realistically, with the hype out there about AI solving all problems, do we think its not going to get used?

Seems like more productive conversation could be had over how to use, than blanket dismissals. The proverbial 400-lbs man in his basement can code this up relatively easily with existing tools, to near state-of-the-art results. So the guy that works for the FBI will too. Let's come up with appropriate, targeted safety mechanisms so the oven doesn't light the whole building on fire.


Not only that you can't uninvent a technology that can run on cheap hardware with open source software and which can be reimplemented from scratch in a week from arxiv papers, but this is a clear case of improper use, where the solution is to set better norms.


Or more precisely, take the flaky oven away from people who misuse it in high stakes situations.


I don’t remember us being asked if we wanted to live in the oven, but here we are anyway.


Good luck getting law enforcement to properly interpret machine learning results. Look at how they still use disproven methods like bite marks to unjustly imprison innocent people.


The product in question is Rekognition, which matches photos. It would be more verifiable by naked human eye than bite marks?

As long as the results are not taken as judgements, but only auxiliary tool, it would not be a concern.

The article indeed argues that any positive match should be vetted by a human.


> The article indeed argues that any positive match should be vetted by a human.

I have said before and I will say again: after the disaster that was the Amazon Fire Phone I have zero trust in any Amazon.com employee to pick up that any time phone to sound the alarm or if they do, I don't expect Amazon to "do the right thing". The reason I mention this is because I don't think this PhD fellow speaks for the the sales "engineers" at Amazon.com. The sales people will say what they have to say to make a sale.

Lets face it: we programmers just work at a company, the business actually runs the show. I have zero confidence that the customer will understand that "any positive match should be vetted by a human". What does vetting mean anyway?

No police has ever stopped me for matching the profile of a suspect, ever. However, I know people who say it happens (or at least used to happen) to them at least twice a year.

Is a blanket ban on law enforcement from using facial recognition possible? Or at least restrict it to federal agencies (a ban on all law enforcement that is not a federal agency)?


That's actually a very good point, sales people will try to exaggerate the capabilities of the technology to the dangerous levels... Well said.


How about "we should not throw away the medical scanner because it can be set wrong and it burned this child"?

Or, don't use the oven on setting 3 because if you do your house will burn down?


> The ACLU has not published its data set, methodology, or results in detail

This is my biggest gripe with how the ACLU has conducted this. I find it hard to distinguish their "test" from clickbait.


Ironically, neither did Dr Matt Wood, the author of this AWS post.

He attempts to refute one unverifiable Rekognition configuration with another.


The way it spread with such virality in the news (NYT, etc...) makes me think it really was engineered clickbait


Can someone explain this line to me?

>In addition to setting the confidence threshold far too low, the Rekognition results can be significantly skewed by using a facial database that is not appropriately representative and therefore is itself skewed. In this case, ACLU used facial database of mugshots that that may have had a material impact on the accuracy of Rekognition findings.


One of the complaints by the ACLU was that 39% of the false matches of congressmen to mugshots were people of color, even though only 20% of congressmen are PoCs. This is a subtle way of saying that the mugshot dataset is probably not representative of the US population as a whole because it overrepresents PoCs.

Although, it's important to note that machine learning algorithms generally have a more difficult time with darker skin because of lower contrast (see scandals on Google identifying black people as gorillas and cameras not auto-focusing on black people). It is a legitimate concern that black people may be misidentified more often with any machine learning algorithm.


What I find problematic is no one can say that people with darker skin (or from any ethnicity in particular) are harder to identify precisely without offending people and being accused of racism/bias.

But we have no objective proof that every one is as recognizable as another.


I know a number of researchers in ML who have stated this and been fine.


OK, maybe ACLU stumbled a bit by using a low (?) 80% confidence threshold. But I wonder how much over-fitting occurs at the AMZN-recommended 99%? Is a 99% fit scalable? What's the false-negative rate? I'm not satisfied that either side has made a convincing argument.


Page is 404 right now. Google cache:

https://webcache.googleusercontent.com/search?q=cache:wzZHRC...

This blog shares some brief thoughts on machine learning accuracy and bias.

Let’s start with some comments about a recent ALCU blog in which they run a facial recognition trial. Using Rekognition the ACLU built a face database using 25,000 publicly available arrest photos, and then performed facial similarity searches of that database using public photos of all current members of Congress. They found 28 incorrect matches out of 535, using an 80% confidence level; this is a 5% misidentification (sometimes called ‘false positive’) rate, and a 95% accuracy rate. The ACLU has not published its data set, methodology, or results in detail, so we can only go on what they’ve publicly said. But, here are some thoughts on their claims:

1. The default confidence threshold for Rekognition is 80%, which is good for a broad set of general use cases (such as identifying objects, or celebrities on social media), but it’s not the right one for public safety use cases. The 80% confidence threshold used by the ACLU is far too low to ensure the accurate identification of individuals; we would expect to see false positives at this level of confidence. We recommend 99% for use cases where highly accurate face similarity matches are important (as indicated in our public documentation).

To illustrate the impact of confidence threshold on false positives, we ran a test where we created a face collection using dataset commonly used in academia, of over 850,000 faces. We then used public photos from US Congress (the Senate and House) to search against this collection in a similar way to the ACLU blog.

When we set the confidence threshold at 99% (as we recommend in our documentation), our misidentification rate dropped to 0% despite the fact that we are comparing against a larger corpus of faces (30x larger than ACLU’s tests). This illustrates our point that developers should pick the appropriate confidence threshold best suited for their application and their tolerance for false positives.

2. In real world public safety and law enforcement scenarios, Amazon Rekognition is almost exclusively used to help narrow the field and allow humans to expeditiously review and consider options using their judgement (and not to make fully autonomous decisions), where it can help find lost children, fight against human trafficking, or prevent crimes. Rekognition is generally only the first step in identifying an individual. In other use cases (such as social media), there isn’t the same need to double check, so confidence thresholds can be lower.

3. In addition to setting the confidence threshold far too low, the Rekognition results can be significantly skewed by using a facial database that is not appropriately representative and therefore is itself skewed. In this case, ACLU used facial database of mugshots that may have had a material impact on the accuracy of Rekognition findings.

4. The beauty of a cloud-based machine learning application like Rekognition is that it is constantly improving as we continue to improve the algorithm with more data. Our customers immediately get the benefit of those improvements. We continue to focus on our mission of making Rekognition the most accurate and powerful tool for identifying people, objects, and scenes – and that certainly includes ensuring that the results are free of any bias that impacts accuracy. We’ve been able to add a lot of value for customers and the world at large already with Rekognition in the fight against human trafficking, reuniting lost children with their families, reducing fraud for mobile payments, and improving security, and we’re excited about continuing to help our customers and society at large with Rekognition in the future.

5. There is a general misconception that people can match faces to photos better than machines. In fact, the National Institute for Standards and Technology (“NIST”) recently shared a study of facial recognition technologies that are at least two years behind the state of the art used in Rekognition and concluded that even those older technologies can outperform human facial recognition abilities.

A final word about the misinterpreted ACLU results. When there are new technological advances, we all have to be careful to be calm, thoughtful, and reasoned about what’s real and what’s not. There’s a difference between using machine learning to identify a food object and whether a face match should warrant considering any law enforcement action. The latter is serious business and requires much higher confidence levels. We continue to recommend that customers not use less than 99% confidence levels for law enforcement matches, and then to only use the matches as one input across others that make sense for each agency. But, machine learning is a very valuable tool to help law enforcement agencies, and while being concerned it’s applied correctly, we should not throw away the oven because the temperature could be set wrong and burn the pizza.


If you set the confidence threshold to 80% in the test you perform in (1), how many misidentifications do you see, if any?


Would probably be less then 5%. Since they used a better dataset. ACLU got 5% misidentification using the biased mugshot database at 80%.


I'd appreciate someone to correct my data if it's wrong. I too would like to know.


404? Was it deleted?


I am guessing legal and/or PR had a fit and removed it, as any publicity around this sort of thing is very sensitive these days, even though they were pointing out legitimate errors in the ACLU's work.


Someone posted the original contents in the comments below


TLDR:

1) When the ALCU tested mugshot photos VS congress member photos, the ALCU set it to use an 80% confidence level threshold, which should result in a 5% false positive rate. There are 535 members of congress, which from this setting should have resulted in 26.75 misidentifications. The ALCU got 28 misidentifications, which is pretty darn close.

2) The ALCU use a comparatively small dataset of photos. Using a different, 30x bigger dataset, and a 99% confidence level resulted in no misidentifications.

3) The police are only supposed to be using this system to narrow down choices, and then have a human sorting out possible matches, which this should do very well.

---

My thoughts: The system appears to be functioning exactly according to specifications. The eternal problem of systems is that the specifications do not match what users are actually expecting/needing the system to do. If the ALCU is confused about how it works, the police probably will be as well.


Ah, good thing we’re employing statisticians as beat officers these days! /s

The point is that all of these decisions are unregulated, and complicated systems are being put into the hands of people who have very different professional backgrounds (and incentives) to Applied Machine Learning Engineers.

- Just because it’s possible to use a better or more representative data set, it doesn’t mean LE agencies can or will

- Just because a particular confidence interval is more suitable, doesn’t mean it will be used. What if there’s pressure to hit targets? What if a senior cop sits on the sys admin’s desk and tells the them that “We’ve really gotta nail this guy. He’s armed, dangerous, and on the run”.

- What if cops are radioed and told that a man who HQ are 95% sure is the armed killer is in the car/room/office ahead? How will the LEO react and behave when approaching the suspect, based on that information? What if 10k faces from CCTV were put through the system to get that match? What if there are 10 other officers drawing their pistols approaching 95% matches for the same subject at the same time?

These are all difficult questions for statisticians and ethicists to answer. We should not be forcing these decisions on LEOs to deal with in real time.

And I, personally, would prefer to live in a free country without facial recognition cameras pointing at me.

There is no democratic consent for any of this.


I agree, but the post is about Amazon being to blame or not. And it seems law enforcement should be regulated in their use of facial recognition, not Amazon. Well, Amazon too, if they used it themselves for wrong doing.

But should you regulate the tech itself? Like ban the development of facial recognition all together? Or for identifying human faces?

I feel if you regulate, it should be in its proper usage. Which would mean the law enforcement is to blame. Ot maybe you'd need to get an explicit license to be allowed to use facial recognition and Amazon could make sure that only licensed individuals get access to their tech flr example.

Point being, nothing here indicates that Reckognition has inherent bias in its design.


Nitpick:

> the ALCU set it to use an 80% confidence level threshold, which should result in a 5% false positive rate.

I'm not sure where the "should" in this sentence is coming from. As far as I know, Amazon's documentation doesn't claim any particular relationship between confidence and false positive rate. The 5% number in the article was simply talking about the actual false positive rate that the ACLU observed.

Not to say the conclusions of the article are wrong, though. It certainly seems unreasonable for someone to assume that "80% confidence" would give an FPR of less than 5%.


possibly some internal metric they have? 80% might be the pairwise similarity for a pair of images and across a whole dataset this might have a 5% error rate.


Thanks!


> The police are only supposed to be using this system to narrow down choices, and then have a human sorting out possible matches, which this should do very well.

I'd rephrase that Amazon recommends (Rekomends?) using a CI of 99% for law enforcement, and that it only be one factor in policing.

---

We're getting to a brave new world here, but there is a lot of parallels with the current one. A common trick for reducing speeding tickets is to request things like when the last time the radar was calibrated, and the date of certification of it's operator.

Soon, your lawyer can challenge ML based convictions by requesting the coefficients of the model.

Also, the police will use every trick in the book to get an arrest, so unless it's clearly illegal (and in some cases that's not enough to stop the police), you can't say the "police should..." do anything.


Where does 99% come from? Who agreed that, why? I think the Amazon folks are shooting from the hip...


I don't know the answers to that. Perhaps the 99% number is why the post was removed (temporarily?).

My understanding about Amazon is that they don't have a culture of shooting from the hip. They certainly encourage risk taking, but not without putting together a solid plan with review.


> If the ALCU is confused about how it works, other people, and police, will be to

Potentially, although big volume customers like public bodies are almost certainly in contact with eg. AWS solution architects, who would hopefully guide them around pitfalls like this.


> who would hopefully guide them around pitfalls like this.

The technical staff interfacing with AWS is probably many layers removed from the day-to-day employees that use this sort of software in law enforcement. Federal agencies like the CIA & FBI have trained analysts to do this work, but your local law enforcement agency isn't going to have this level of specialization or training. You can look at the history of any new tool of forensic science and see unfortunate abuse when it's first deployed (like DNA evidence).


I agree with your comment.

Now the question is: Is amazon responsible and should it get the bad press? Or should we blame the local cop and his training? Or the police software implementing the technology?


I'm sure there will be plenty of blame to go around. In a car accident, culpability can fall on the driver, the car manufacturer or even the supplier of a faulty part. Regulations will eventually get put into place setting reasonable guidelines for determining negligence and which party (or parties) are at fault.

Unfortunately a lot of innocent people will be victims of bad policy and training until then.


Well, this is $12 of work here (see original articles) and one of the attractions of the service is that it is being bought and used at these levels of cost. At scores or $100's architect led support is not possible.


I'm sure this rebuttal will get as much coverage as the original headline. "Confidence intervals"? "Misclassification rates"? Get out of here with your science-talk, nerd!

Sadly, these days I see too many ML practitioners and "data scientists" without the necessary prob/stats foundation. Misinterpreting data happens even to experts, so not surprisingly it's going to happen even more so to amateurs. Suggesting more foundational knowledge is considered elitist. Why shouldn't a month-long bootcamp be as good as a MS/PhD? In this case, the original ACLU results fits a certain narrative and hits all the hot topics so it was bound to be picked up.


Unlike a dna test, a match via facial recognition would have trouble holding up in court without other significant evidence.

Facial recognition is likely to help rapidly match known photos of a suspect with security footage and imagery gathered near by to help pin down movements of a suspect prior to and after a crime, in order to focus the search for further evidence.

This is a tool to catch stupid criminals; it seems easily fooled by those who approach crime in a more systematic manner.

There should be laws that prevent using facial recognition to track the locations of people who aren’t the subject of investigation. I am sure that technology would be very desirable to advertisers, and I don’t want them to have it.


The ultimate test of any predictive model is whether it works in the intended stakeholder setting. Other types of diagnostics like ROC curves, performance on benchmark data sets, etc., are of course valuable, but ultimately do not necessarily reflect the true, run-time distribution of inputs or constraints that will be relevant for a stakeholder’s usage scenario.

I’ve said this many times. What it means is that even if you think it’s great to outsource your machine learning model to a third party like Amazon, and consume it like a service, you really need in-house expertise in machine learning anyway, to help you interpret the accuracy, diagnostics, and some “integration test” notion of model performance on a sample that is agreed upon as truly representative of the stakeholder runtime conditions.

So either way, you still have to pony up the dough to employ adequate in-house expertise in modeling and domain machine learning.

As a result, there actually are not that many use cases when it would make sense to not build your own tailored model in-house. If you already must hire most of the staff with necessary skills to evaluate a third-party, the incremental effort to train a good-enough fine-tuning of some imagenet-based model is really not that high. And you can customize the training and diagnostics.

This is especially critical when there are also stakeholder-specific latency or throughput constraints.

For face detection in particular, I know a great deal about this directly related to AWS Rekognition, because my team extensively evaluated it for the possibility of outsourcing face detection calls in an in-browser image editing application.

Not only could Rekognition not meet our runtime requirements, but separately it had poor accuracy and coverage (often capping out at detecting a small number of faces per image) and the Rekognition-reported confidence score aligned horribly with our own ground truth bounding box data that allowed us to quantify overlap of the detection with intersection-over-union scores.

We tested a handful of fine-tuned prebuilt models in Keras: Resnet, Inception, and our own port of MTCNN, and had drastically better face detection (on a very large training corpus) than Rekognition in about a week.

And our cost per request, even deploying to AWS, was much lower than Rekognition even at our high volume of traffic and even inclusive of the cost to spin up GPU instances for training.

What’s more is that we tightly controlled how to write the web service layer wrapping it, and how to optimize the image preprocessing steps, etc., that we have no visibility into with a third party.

I think there is a conceptual gap with this stuff where people just think that because you could commoditize a web service around a prediction algorithm, it must mean it would be economically valuable.

But that part is the absolute least meaningful part. The whole enchilada is diagnosing how the model performs on the exact stakeholder use case, inclusive of performance constraints.

For this reason, I think the only market for “ML tools for people who don’t know ML” is going to be just like vapid corporate IT consulting.

Don’t get me wrong: big tech companies will profit from this. But not because it solves any actual prediction problems or saves anyone money compared with in-house ML development. It’ll just be the standard Dilbert-y politics that has always yielded profits to IT consulting services.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: