Group Log Data by Timestamp in Python with Pandas

I often find myself with logs in the following format, exported in CSV. First column is timestamp in milliseconds, second column is timing for some request, also in milliseconds. Often I’d like to see the min, max and average request time, group by day and hour. Here is how to get this insight using Pandas … Read more

Analyzing 200,000 Deleted Tweets with Spark, 40 Billion Comparisons

My wife and I were watching Jack Ryan, where John Krasinski is playing an FBI analyst. I wanted to be like him, so I began to look for things to analyze. In the past I’ve done text analysis for comments left on Mark Zuckerbergs Facebook post, which I described in I Tried To Virtually Stalk … Read more

Crunching Honeypot IP Data with Pandas and Python

I am taking a cyber security class. This week’s assignment had us work on Honeypots. Honeypot is a server that pretends to have a vulnerability of sorts (open ports, old software etc.) and instead collects data on people who are trying to hack it. At the end of the experiment I ended up with some … Read more

Network Analysis of Donald Trump’s Tweets for 2017

A year and a half ago I Tried To Virtually Stalk Mark Zuckerberg. It was a failed attempted and instead I analyzed comments on one of Mark’s posts. Few days ago I came across a repo/archive of Donald Trump’s Tweets and I thought it would be interesting to run a similar analysis on this data. … Read more

I Tried To Virtually Stalk Mark Zuckerberg

Part 1. A Naive Dream The Dream In late 2015, I finished reading Automate the Boring Stuff with Python and was very inspired to try to automate something in my life. At the same time, I have been always fascinated by Mark Zuckerberg – the Bill Gates of our time. A lot of people love … Read more