These days I'm really busy with working at FlxOne, and also developing the new CloudPelican project. It solves problems related to non-centralized, non-accessible data. CloudPelican makes it searchable in real-time with a good looking interface.
Today is the day, that my first external blog post was published. It explains why online marketers should leverage all the data available. Size does matter! I think you should read it when you are interested in big data, marketing or the combination of these.
How to: automated transition from Amazon S3 to Glacier
Since not so long Amazon has the Glacier service. This is a tape-robot-like system which is even cheaper for archiving your data. It has pros and cons. All together it's really simple, if you don not expect to need access to the data again, but want to store it for archiving, put it into Glacier. You can get it back from Glacier, but it's slow (in terms of hours) and expensive.
So, my data is in Amazon S3, how to automatically move it to Glacier? Just follow me!
1. Login to your Amazon Webservices Console
2. Go to the S3 service page (the one that lists your buckets)
3. Now select your bucket and click the "Properties" button in the top left
4. Unfold the "Lifecycle" tab
5. Now you can add a new rule, click on the button
6. Click the "Add transition" button, and configure as proffered. Important fields are the period and the bucket selection (e.g. entire or partial by prefix) Warning: make sure you don't pick the expiration option, this will delete your data!
7. Now you're good to go, your data will be moved to the Glacier layer after 31 days.
Recently I noticed that our increasingly busy hive server started to fail more and more (couple of times a month). This caused issues in other parts of the system. To resolve some basic problems (dead process, not listening on the correct port) I wrote a little script that can run every minute (from the crontab).