Data Science of the Apartment Search, part 1

Summary

The search for the perfect apartment is a natural fit for data science: between city data published via open-data initiatives and Craigslist listings of apartments to rent, there’s a lot of structured and semi-structured geographic data to analyze. With rent prices rising rapidly in urban areas, finding a good apartment at a reasonable price in the right neighborhood is a challenge, and being the first to visit and apply for a property confers a sizable competitive advantage. This is especially relevant in San Francisco, where legends abound about high rental prices and the crazy competition of landing a place. As the Bay Area is a place I’m considering moving to within the next couple months, using data science to identify my next apartment represents both a useful and fun challenge to take on.

Read More

Efficiently Finding Scooters to Charge

Summary

E-scooters overran Portland during the pilot period during summer 2018, and gave rise to a new gig-economy role: finding scooters scattered around town to charge nightly. In this post, I build and explore a genetic algorithm to efficiently find good routing solutions to this TSP problem and a TSP-like problem (with qualitatively different behavior) if the solution-space is constrained.

Read More

Predicting Wine Varieties, Part 1 -- Naive Bayes

Summary

This is the first post in a series where I use different algorithms to try to classify, i.e. predict a wine variety given its description. I start off with a simple yet powerful algorithm called Naive Bayes. This algorithm is particularly easy to build (and use) from scratch and so though the code is not intended for use in production, it was fun to write. It performed admirably (33.16% accuracy on a validation set of 4917 rows representing 40 different varieties, given a description of more than 140 characters). It will be compared head-to-head with the other algorithms on the test set at the conclusion of this series!

Read More

Hot summer days

Summary

In Portland we recently broke the record for number of days above 90 F in a year (with 30 in 2018, as of August 21st) (https://twitter.com/NWSPortland/status/1032384600696213504). I wanted to visualize whether this sustained heat really was unusual, or if I were falling for recency bias. I remembered an article from the New York Times that visualized Steph Curry’s 3-point shooting 2015-2016 NBA season in an interesting way, showing the cumulative sum of three pointers made over the course of the season (https://www.nytimes.com/interactive/2016/04/16/upshot/stephen-curry-golden-state-warriors-3-pointers.html).

I produce a simliar graph using cooling days as the metric, using open data from the NOAA collected at the Portland airport (PDX) from 1970 to 2018. I show that this summer has generally been hotter than every year since 1970 except for 2015.

Read More

Identifying wine qualities

Summary

With over 100,000 wine reviews to play with (and to make up for my unrefined palate), I try to find words useful for describing different types of wine. Given a wine I’ve never had before, but I know which variety it is, I try to find both words that are common to describing that variety, and more subtle words to describe it if I feel like going out on a limb. I additionally create some word clouds of different wines by variety; these facetted plots are out of reach of the wordcloud library but done passably by extending ggrepel.

Read More