Hacker Newsnew | past | comments | ask | show | jobs | submit | herrherr's commentslogin

As your products and services are targeted towards bigger clients, do you have any advice for smaller companies on who to talk to regarding pricing?


Any hints on where to start?


Depends what interests you, but some suggestions:

- Google "generative art". You'll find lots of images and you can usually work out roughly how they were made. Try to reverse engineer them and create your own facsimile.

- Check out links here http://blog.hvidtfeldts.net/index.php/generative-art-links/

- Manning have a book here: https://www.manning.com/books/generative-art

- GRATUITOUS SELF PROMOTION ALERT! I'm working on book which is part functional programming (in Scala) and part generative art.

Current draft is here: https://dl.dropboxusercontent.com/u/8669329/creative-scala.p...

Code is here: https://github.com/underscoreio/doodle

Discussion forum is here: https://gitter.im/underscoreio/scala

Feedback very much appreciated!


Start with something that looks a bit like something you might want to make into art. One of the standard fractals, Perlin noise with a fixed seed, anything that's got a bit of randomness to it. Look for a section that's inspiring. Then look to see if you can tweak the logic to make that closer to what you want it to be.


I've been trying for weeks now to get a system running that can handle larger than RAM datasets and returns queries in an acceptable time. It's running ok now but far from optimal (size of DB is ~100 GB and it contains a few hundred million entries).

Does anyone here have experience with any implementations (such as likelike, lshkit, etc.) and can recommend something that can handle larger sets? All the implementations I have found were either not maintained, old, not running or not suitable for production use.

Will definitely take a look at the paper but unfortunately it's always a very long way from here to an actual implementation (there is no code published as far as I could see).


Google's simhash paper shows how to do 8 billion 64bit fingerprints in memory:

Detecting Near-Duplicates for Web Crawling (http://www.wwwconference.org/www2007/papers/paper215.pdf)

SEOMoz has in-memory and db-backed implementations of simhash in Python (https://github.com/seomoz?query=simhash)


Simhash is indeed wicked fast.

Unfortunately, it's also encumbered by a patent: http://www.google.com/patents/US7158961


I've been playing with an implementation on top of lightning mdb[1]. Your profile doesn't have an email but feel free to email me if you're interested.

[1] http://symas.com/mdb/


Actually I'm also using lmdb (together with Python/numpy) :) Added an email address to my profile, would be happy to exchange some experiences.


100GB isn't that big a deal. If you have at least 16GB of RAM it should be a breeze. There are much larger data sets in OpenLDAP in production around the world.

But I wouldn't choose python for large scale data processing work. The python CPU/memory overhead is like 100:1, compared to C. (This is why I worked on rtorrent and ditched the original bittorrent client ASAP, and why I hate bitbake....)


First of all, thanks for open sourcing lmdb :)

The biggest problem currently is actually degrading performance, although I'm almost 100% sure that this isn't caused by lmdb itself, but rather by the bindings I've tried.

In the end, doing it directly in C is probably the only thing that will actually work.


Currently without a doubt: https://www.youtube.com/user/mathematicalmonk

An extensive series about machine learning (100+ videos).


We are currently using perceptual hashes (e.g. phash.org) to do hundreds of thousands of image comparisons per day.

As mentioned in another comment, you really have to test different hashing algorithms to find one that suits your needs best. In general though, I think it is in most cases not necessary to develop an algorithm from scratch :)

For us, the much more challenging part was/is to develop a system that can find similar images quickly. If you are interested in things like that have a look at VP/MVP data structures (e.g. http://pnylab.com/pny/papers/vptree/vptree/, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.43.7...).


Indeed, the storage and retrieval of similar images is the hardest part. I do not know of a single networked open-source storage solution for this. I really wish that there was a project with a mindset of Redis, but for MVP trees. By the way, may it be possible to implement MVP data structure in Redis, as the project is now? I can not think of possible replication issues with this, apart from the fact that one would have to pre-define a metric space for every tree.

It could be a great extension to Redis DSL.


Yes, you're right. We're not using SQL queries at the moment as that would be very inefficient, it was just as an example for a small dataset.

I'm currently researching MVP's and reading on VP-trees, BK-trees [1], GNAT [2] and HEngine [3]. Do you have any advice?

[1] http://blog.notdot.net/2007/4/Damn-Cool-Algorithms-Part-1-BK...

[2] http://www.vldb.org/conf/1995/P574.PDF

[3] https://www.cse.msu.edu/~alexliu/publications/HammingQuery/H...


I think you are on the right track there.

The thing is though, you won't have difficulties finding papers on those topics. However, you will probably not have any luck finding many concrete and practical implementations that you could look at.

So it's a far way from reading the papers to having something working.

If you find something, please let me know.


There's a list of CBIR's on Wikipedia and ammong those there are a few open source ones. I didn't really had time to check them all but during skimming through them imgSeek [2] caught my eye.

[1] http://en.wikipedia.org/wiki/List_of_CBIR_engines

[2] http://www.imgseek.net/isk-daemon


Per se the product looks interesting, but having all these new accounts praise it, looks a bit odd.


herrherr, don't worry they're legit. I know the founders,and they build great tech. I'm very excited to see their new product in the wild.


We tweeted the link, herrherr... we have friends and backers, too.

If the product looks interesting, you should try the dev tier... there is no credit card required.


Sorry, looks like astroturfing comments from my end.


Thanks for teaching me a new term, I've not heard of astroturfing before.

Not sure how this will be applicable to any of my projects right now, but I've got to admit the demo's work impressively.


Shameless self-plug:

www.getmetricmail.com :D


The really interesting part is actually to recognise how hard it is for a new app to enter my daily-use list. It's almost impossible. Some make it in there for a few days or weeks but will vanish quite soon.

Either I need the app for my daily work or it is a fire-and-forget service that I once signed up for and that doesn't require any active input from my site.

Anyway, here is my list:

- pivotaltracker.com

- github.com

- dropbox.com

- olark.com

- gmail.com

- google.com/analytics

- hipchat.com


For comparison have a look at the getclicky stats: http://getclicky.com/marketshare/global/web-browsers/

They seem to be pretty close.


getmetricmail.com creates simple Google Analytics reports and sends them to you as a PDF. Currently 3000 free users. A handfull pays, so it makes about $100 per month. A good example of Freemium gone wrong.


I've signed up to check it out. As Alex said, it's a good idea, nice design, etc.

Maybe the feature exists and I didn't spot it, but if you white labelled it so that web designers could send out branded messages to their clients as a value-add, I think you could see paid accounts pick up a bit.

Edit: OK, first email is in. Seems that I have to click a link or get an attachment - this is why I already avoid the Analytics mailed reports. Any chance you can just send the data in the email or does the API not permit it for some reason? I probably wouldn't use this going forward if I had to click through to something or open a PDF, I know that's sulky but just how it is.


You can receive pdf attachments directly, so you don't have to click on the link. Nevertheless that's not what you're looking for, I guess ;)

We thought about putting the data directly into an email, but the crappy HTML/CSS support in the gazillion email clients, make this a pretty tough job.


Yeah, I think Analytics does PDFs from memory and I cancelled all of those.

Can appreciate the frustration with HTML/CSS support - maybe if you kept your layout really simple and/or called it an Old School theme. Mostly I'd be looking for anything that quickly showed me if there was something wrong with a site (or right, e.g., major incoming link).

Wonder if you can throw in some marketing factoids like "Fourth straight month with an increase in traffic" or "Traffic growth continues; fourth straight month" - the sorts of things a marketing guy can repeat to the boss without any more time or research.


I haven't used your product so I may be saying nonsense, but why don't you also embed an image created from the PDF? HTML support may be crappy, but as far as I know most of them support images.


It's a great idea. Don't give up yet. Do you allow people to add multiple email addresses that the reports would be sent to. I imagine that would be v useful for businesses. For example, everyone in the marketing dept could receive a report each week.


That is actually possible.

The case here shows what happens when you offer too much features/resouces in the free plan.

The toughest part now is deciding if it makes sense to invest more time into it. But I guess that is a general problem for startups that haven't yet found product/market fit. You can't really know if you are miles or just an inch away from that fit.


I guess I'm missing something, but there is a setting in Google Analytics to send dashboard to an email address every week / month in PDF format. What's the difference here?


Indeed. Google Analytics allows you to create those reports. But apparently it's far too difficult for people who are not that tech-savvy.

As I said we currently have over 3000 users, so there seems to be some interest in such a solution :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: