More

xomateix · on Sept 21, 2021

This made me remember a webpage that using IMDB would calculate the degrees of separation between Nicolas Cage and the input person (actress, director...) of your choice.

I did a quick search and couldn't find it. Does anybody remember it? I'm wondering if my memory or my search skills are failing or if it has simply disappeared.

bryanrasmussen · on Sept 21, 2021

sure it wasn't 6 degrees of Kevin Bacon? https://en.wikipedia.org/wiki/Six_Degrees_of_Kevin_Bacon

gingericha · on Sept 21, 2021

And at one point even, google allowed you to directly search for someone's bacon number: https://www.wired.com/2012/09/easter-egg-google-connects-the...

richrichardsson · on Sept 21, 2021

I've only ever heard of something similar, the Bacon Number, which is the degrees of separation between a given person and Kevin Bacon [1].

My Bacon Number is 2, having been an extra in Alexander, which featured Connor Paolo, who was in Mystic River with Kevin Bacon.

[1] https://www.oracleofbacon.org/help.php

agucova · on Sept 21, 2021

In the course Introduction to Computer Science: AI (CS50AI) by Harvard, one of the first homeworks is implementing a search exactly like this with Python.

It was extremely fun!

xomateix · on Aug 2, 2021

Intent HQ | Scala Engineer | London, NYC, Barcelona, Lisbon & Remote| Full or part time

At Intent HQ we are working in a close relationship with our clients to help them have a better understanding of their data so they are able to provide better services to their own customers. We are ~80 people from all over the world (we speak 15 different languages!) based in London, Barcelona and NYC. As we are small, we love sharing ideas and really like to work along the principles of valuing 'individuals and interactions', 'customer collaboration', 'responding to change' and 'working software'.

Tech stack: Scala, Typelevel stack (cats-effect, htt4s, fs2...), Cassandra, Elasticsearch, PostgreSQL, Kafka, Docker, Nomad, Terraform, Consul, Vault, AWS, TypeScript, React, Redux

We have several positions open: https://intenthq.teamtailor.com/

Salary ranges depend on location.

If you want more information feel free to drop me an email: albert (at) intenthq.com

xomateix · on July 20, 2021

According to GitHub support, they didn't exclude any repo based on the license: https://news.ycombinator.com/item?id=27769440

xomateix · on Aug 26, 2019

In Spanish most people call it "tilde" as well, but you can also call it "Virgulilla" [1]. I always call it like this just because of how it sounds, love that word.

[1] https://es.wikipedia.org/wiki/Virgulilla

xomateix · on April 25, 2019

I am of the opinion that you don't teach anything, it's the person that learns something.

So, I'd go for what others have already mentioned. Learn about the person, how they are, how they learn better, what are their interests.

Find something they like and enjoy, so they are motivated in learning.

Be ready and available to answer questions and adapt to their rhythm and needs.

Prepare materials and different options so they can chose their path as they go.

Facilitate them changing their mind, going back and forth, making their own mistakes.

In my case, for example, I learn by doing, and pair programing with somebody helps me a lot, but other people might prefer having a theoretical background first and will want to read a book before diving into coding.

EDIT: adding paragraphs for clarity

xomateix · on May 24, 2018

Hey, one of the co-maintainers here. Thanks for your comments.

>> rows to randomly sample ... hash (using ... 32 bits) the column ... mod the result by the [constant] value

> This is not random. It deterministically selects the same very predictable fraction of rows.

Yep, you are right. We didn't intend the sampling function to be part of the anonymisation but just something we tend to use and we thought it would be useful to have it.

Its objective is to pick a portion of the input data. No more.

>> UK format postcode (eg. W1W 8BE) and just keeps the outcode (eg. W1W)

>> Given a date, just keep the year

> Partial postal codes and dates quantized to the year are still very revealing. Combined with other data (such as a hashed name), the partial postal code may allow a lot of people to be uniquely identified.

You are absolutely right. Depending on the use case and your data, having the outcode, the city or the year might be very revealing. In some other cases even having decades or centuries might be revealing.

We don't pretend that each function provided applies to all use cases. But in certain use cases partial postcodes or years can be good enough.

>> Hash (SHA1) the input

> Hashing does not provide anonymity.

We are very aware of that. That's why we offer the option to add a salt (that the user of the tool can make as long as possible and throw away after the anonymisation process).

>> range

> This is the only feature that could provide anonymity, if it is used correctly to group large numbers of individuals into the same bucket. This is probably more difficult that it first appears.

We usually work with sets of data that are tens of millions of users. Choosing the right ranges and, specially, analysing the data and making sure you anonymise the outliers (by choosing your bottom and top ranges carefully) it's crucial.

Again, this tool is a hammer. We expect a person that understands about wood and nails to analyse their problem and use it.

xomateix · on May 24, 2018

Thanks for the idea. We don't support anonymisation of IP addresses because it's not in any of our use cases yet. But I've already added an issue to address it.

xomateix · on May 24, 2018

Hey, one of the co-maintainers of the project here. And the one that decided to use json.

I agree with you. there are other options for configuration that are much better than json (yaml, toml).

Main reason for chosing json was simplicity. This was my first project in go and I didn't want to spend much time in it either. I found an example that was using json and I saw that I didn't need any external library to decode it. I thought that was good enough, at least for now.

Will probably look into using a library that supports yaml/toml for configuration in the future.

xomateix · on May 24, 2018

Thanks for the tip. We plan to add an examples folder. We'll add a preview too.

xomateix · on May 24, 2018

In addition to what Nathan has said, I'd add we needed a simple native command line tool that could be dropped into any server and easily work along other unix tools like cat, gzip, cut...