Flying high with CloudPelican
These days I'm really busy with working at FlxOne, and also developing the new CloudPelican project. It solves problems related to non-centralized, non-accessible data. CloudPelican makes it searchable in real-time with a good looking interface.
Copy gzip files recursively from Amazon S3
Configure s3cmd by running:
Then continue to copy all the files from your directory tree:
s3cmd ls -r s3://your_bucket/folders_optionally_with_wildcard | grep "\.gz" | awk '{ print $4 }' | xargs -I '{}' s3cmd get {} /tmp/your_local_folder/
Future: drones dropping of medication
Year 2020. Old, weak, or sick people can get their medication via a drone, that drops of a tiny package at their front door.
How cool would that be?!
Image of 3 quadricopters
Ping
Yes, I am alive, yet a little busy with FlxOne and CloudPelican. Of course there's a lot of cool things going on, but I didn't have time to write something, yet.
Extract email address from website url in Google docs spreadsheet
Really simple, use this "magic":
=LOWER(REGEXREPLACE(CONCAT("info@"; REGEXREPLACE(A1; "http(s)?://(www\.)?|\/$|\?(.*)$"; "")); "/(.*)"; ""))
Make sure you reference the A1 column to the one containing the website url.
My first external blog post
Today is the day, that my first external blog post was published. It explains why online marketers should leverage all the data available. Size does matter! I think you should read it when you are interested in big data, marketing or the combination of these.
How to: automated transition from Amazon S3 to Glacier
Since not so long Amazon has the Glacier service. This is a tape-robot-like system which is even cheaper for archiving your data. It has pros and cons. All together it's really simple, if you don not expect to need access to the data again, but want to store it for archiving, put it into Glacier. You can get it back from Glacier, but it's slow (in terms of hours) and expensive.
So, my data is in Amazon S3, how to automatically move it to Glacier? Just follow me!
Step-by-step guide:
1. Login to your Amazon Webservices Console
2. Go to the S3 service page (the one that lists your buckets)
3. Now select your bucket and click the "Properties" button in the top left
4. Unfold the "Lifecycle" tab
5. Now you can add a new rule, click on the button
6. Click the "Add transition" button, and configure as proffered. Important fields are the period and the bucket selection (e.g. entire or partial by prefix) Warning: make sure you don't pick the expiration option, this will delete your data!
7. Now you're good to go, your data will be moved to the Glacier layer after 31 days.
Cheers!
Apache hive monitor script
Recently I noticed that our increasingly busy hive server started to fail more and more (couple of times a month). This caused issues in other parts of the system. To resolve some basic problems (dead process, not listening on the correct port) I wrote a little script that can run every minute (from the crontab).
Of course it's not the best option out there, but it does the job and automatically resolves the issues I encountered.
Enjoy!
ZeroMQ dll for Windows download
As the website of ZeroMQ is down for a couple of weeks now I'm sharing a build for Windows on my personal blog. You can download it here.
Mouse tracking for heatmaps and clickmaps
Yesterday I was playing around with creating a really simple piece of javascript that is capable of generating heatmaps and clickmaps. This could be used to store data of visitor usage in a database and play back at a later time, or used for aggregated analysis. This could answer questions like:
- Where do people click?
- Where do people hover / move their mouse?
Continuing as Data Architect at FlxOne
In the past 6 months I have been active as Software Engineer at FlxOne. Really got a lot of opportunities to extend my knowledge.
As from the 27th of November I now wear the job title Data Architect. This is due my daily activities working with software and data.
The software I developed was all around data, with an even stronger focus on data architecture. Things like setting up a data warehouse (currently 96TB of storage, bunch of processors, even more RAM), lots and lots of
ETL and all other data architect
related things.
So that's about it. Me being a Data Architect with a passion for developing the fastest, best performing software.
Would you like to become my right hand as
Online Zookeeper or are you interested in the whole
RTB thing? We still have some jobs open. Even one for a
FlxWizard. Becoming a FlxWizard is like mixing marketing, technology and complex problems in one single job. Nice one though!