Hacker Newsnew | past | comments | ask | show | jobs | submit | sebhtml's commentslogin

big.tsv is basically the same line repeated 2 000 000 times:

  $ awk \
 'BEGIN{for (i=0; i<2000000; i++){print "abcdef\tghijk\tlmnop\tqrstuv\twxyz1234\tABCDEF\tHIJK\tLMNOP\tQRSTUV\tWXYZ123"}}' \
   > big.tsv

  $ cat big.tsv |sort|uniq -c
  2000000 abcdef ghijk lmnop qrstuv wxyz1234 ABCDEF HIJK LMNOP QRSTUV WXYZ123
You can first sort all the lines, and count them. The first column is now the number of instances of that line.

The idea is that if 2 lines are identical, the number of "count++" emitted will be exactly the same.

I took your gawk code, but I am starting the for loop at 2 instead of 1, and I removed the -F'\t' option.

  $ time cat big.tsv |sort | uniq -c \
  | gawk '{for (i=2; i <= NF; i++) {if (index(tolower(substr($i, 1, 3)), "bc") != 0) {count += $1}}}END{print count}'
  4000000

  real 0m0.724s
  user 0m0.605s
  sys 0m0.322s

Edit: added backslashes to split lines.


You input length is 101.

If you "play" for a long time, eventually, all 64 6-grams will have occurred.


By using "1" for left (lastKey = 0) and "0" for right (lastKey = 1), I get 12%.

A de Bruijn sequence DB(2, k) is spelled by an Eulerian path in the corresponding de Bruijn graph, whose vertices are {0, 1}^k and where any arc (x, y) has the following property: x[2..k] == y[1..k-1].

Obviously, there are as many Eulerian paths as there are de Bruijn sequences.

Each such de Bruijn sequence has a different capability at getting a good score at the game "https://www.expunctis.com/2019/03/07/Not-so-random.html".

  diff --git a/not-so-random.html b/not-so-random.html
  index 48c04da..168a287 100644
  --- a/not-so-random.html
  +++ b/not-so-random.html
  @@ -169,8 +169,12 @@
         randomHelpFunc = function(evt) {
            evt.preventDefault();
            //document.onkeydown = null;
  -         for (let i = 0; i<10; i++) {
  -            lastKey = Math.round(Math.random());
  +
  +         // db(2, 6)
  +         var inputString = "0000001000011000101000111001001011001101001111010101110110111111";
  +
  +         for (let i = 0; i< inputString.length; i++) {
  +            lastKey = inputString[i] == "0" ? 1 : 0;
               testPrediction();
               updateAll();
               predictNext()


How to find which sequence is the best?


I modified the "randomize!" button to inject the de Bruijn sequence DB(2,6), and I get a rate of 50%.

Is there an error in my patch ?

  diff --git a/not-so-random.html b/not-so-random.html
  index 48c04da..c650b5b 100644
  --- a/not-so-random.html
  +++ b/not-so-random.html
  @@ -169,8 +169,12 @@
         randomHelpFunc = function(evt) {
            evt.preventDefault();
            //document.onkeydown = null;
  -         for (let i = 0; i<10; i++) {
  -            lastKey = Math.round(Math.random());
  +
  +         // db(2, 6)
  +         var inputString = "0000001000011000101000111001001011001101001111010101110110111111";
  +
  +         for (let i = 0; i< inputString.length; i++) {
  +            lastKey = inputString[i] == "0" ? 0 : 1;
               testPrediction();
               updateAll();
               predictNext()


I didn't review the patch, but that is the expected result. That sequence will cause it to think that whatever previous 5 pattern it saw, your next is equally likely to be 1 or 0. So it will guess randomly, and half the time it is right.

However with $1 vs $1.05 returns, you'll steadily make money.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: