Flying high with CloudPelican

These days I'm really busy with working at FlxOne, and also developing the new CloudPelican project. It solves problems related to non-centralized, non-accessible data. CloudPelican makes it searchable in real-time with a good looking interface.


Got your attention? Take a look at our concept movie:

View the CloudPelican introduction movie

If you're interested in this you can signup for the public beta at www.cloudpelican.com

Copy gzip files recursively from Amazon S3

Configure s3cmd by running:


s3cmd --configure

Then continue to copy all the files from your directory tree:

s3cmd ls -r s3://your_bucket/folders_optionally_with_wildcard | grep "\.gz" | awk '{ print $4 }' | xargs -I '{}' s3cmd get {} /tmp/your_local_folder/


Future: drones dropping of medication

Year 2020. Old, weak, or sick people can get their medication via a drone, that drops of a tiny package at their front door. 
How cool would that be?!


drones.jpeg
Image of 3 quadricopters

Ping

Yes, I am alive, yet a little busy with FlxOne and CloudPelican. Of course there's a lot of cool things going on, but I didn't have time to write something, yet.

Extract email address from website url in Google docs spreadsheet

Really simple, use this "magic":


=LOWER(REGEXREPLACE(CONCAT("info@"; REGEXREPLACE(A1; "http(s)?://(www\.)?|\/$|\?(.*)$"; "")); "/(.*)"; ""))

Make sure you reference the A1 column to the one containing the website url.

My first external blog post

Today is the day, that my first external blog post was published. It explains why online marketers should leverage all the data available. Size does matter! I think you should read it when you are interested in big data, marketing or the combination of these.



How to: automated transition from Amazon S3 to Glacier

Since not so long Amazon has the Glacier service. This is a tape-robot-like system which is even cheaper for archiving your data. It has pros and cons. All together it's really simple, if you don not expect to need access to the data again, but want to store it for archiving, put it into Glacier. You can get it back from Glacier, but it's slow (in terms of hours) and expensive. 

So, my data is in Amazon S3, how to automatically move it to Glacier? Just follow me!


Step-by-step guide:

1. Login to your Amazon Webservices Console

2. Go to the S3 service page (the one that lists your buckets)

3. Now select your bucket and click the "Properties" button in the top left
s3g-1-image.png

4. Unfold the "Lifecycle" tab

s3g-2-image.png


5. Now you can add a new rule, click on the button

6. Click the "Add transition" button, and configure as proffered. Important fields are the period and the bucket selection (e.g. entire or partial by prefix) Warning: make sure you don't pick the expiration option, this will delete your data!

s3g-3-image.png


7. Now you're good to go, your data will be moved to the Glacier layer after 31 days.

Cheers!

Apache hive monitor script

Recently I noticed that our increasingly busy hive server started to fail more and more (couple of times a month). This caused issues in other parts of the system. To resolve some basic problems (dead process, not listening on the correct port) I wrote a little script that can run every minute (from the crontab).


I posted the hiveserver-check.sh script at Github. Check it out!

Of course it's not the best option out there, but it does the job and automatically resolves the issues I encountered.

Enjoy!

ZeroMQ dll for Windows download

As the website of ZeroMQ is down for a couple of weeks now I'm sharing a build for Windows on my personal blog. You can download it here.


Mouse tracking for heatmaps and clickmaps

Yesterday I was playing around with creating a really simple piece of javascript that is capable of generating heatmaps and clickmaps. This could be used to store data of visitor usage in a database and play back at a later time, or used for aggregated analysis. This could answer questions like:

- Where do people click?
- Where do people hover / move their mouse?

Simple snippet can be found over here at Github/RobinUS2/MouseTrail.

Continuing as Data Architect at FlxOne

In the past 6 months I have been active as Software Engineer at FlxOne. Really got a lot of opportunities to extend my knowledge. 

As from the 27th of November I now wear the job title Data Architect. This is due my daily activities working with software and data.

The software I developed was all around data, with an even stronger focus on data architecture. Things like setting up a data warehouse (currently 96TB of storage, bunch of processors, even more RAM), lots and lots of ETL and all other data architect related things.

So that's about it. Me being a Data Architect with a passion for developing the fastest, best performing software.

Would you like to become my right hand as Online Zookeeper or are you interested in the whole RTB thing? We still have some jobs open. Even one for a FlxWizard. Becoming a FlxWizard is like mixing marketing, technology and complex problems in one single job. Nice one though!

comments powered by Disqus