Describing an unknown SQLite database
Recently we were given SQlite database and tasked with writing queries to extract various insights. Problem was, we weren’t given any information about the database tables.
Data Scientist @ Metis
Recently we were given SQlite database and tasked with writing queries to extract various insights. Problem was, we weren’t given any information about the database tables.
For the third project in Metis’ Data Science program, I used the Correlates of War data set to build a Random Forest classifier that predicts whether a conflict between two nations will turn violent. To make it interesting, I built a front-end application using Flask and d3.js.
The second project in Metis’ Data Science program is an exercise in using Linear Regression to either interpret or make predictions with data. I chose to use movie data to predict the world-wide box office gross using only information available before a movie is released.
Our project this week involves web scraping. The first thing I did was write a little code to cache web pages locally. That way, I’m a better web citizen for not hitting the host with repeat requests for the same page, and I can work faster when refining my design and algorithms.
In a previous post I showed where to find the MTA turnstile data and how to load the files into a Pandas data frame. Now I’ll take a closer look at the data, starting with the entry/exit timestamps.
The first week at Metis’ Data Science Bootcamp, we’re assigned to teams and tasked with using MTA data to help a fictional client place street teams at NYC subway stations. What interested me most about this project were the quirks and anomalies I observed in the MTA data. Though the assigment was fictional, the data is real, and these observations might help other Data Science students/practioners make sense of it.