Jim King hails from Mill Valley, CA, and is a highly successful real estate agent who happens to enjoy data science. He holds a Computer Science degree, as well as an MBA in Finance, and has 20 years of experience in the tech field. King loves learning, a good challenge, and competition but was unaware of the differential privacy field prior to the 2020 Differential Privacy Temporal Map Challenge so there was a steep learning curve for him. However, his fifth place showing in the 2nd Sprint provided additional motivation for the 3rd Sprint.
King's main idea for this sprint was to combine features in the pre-processing phase, create privatized histograms of the features, then during the post-processing phase create the simulated data. The individual taxis are created by simply counting the number of taxi_ids, adding noise and then iterating through the privatized count. The number of trips per taxi_id is calculated by counting the taxi_ids with k number of trips (k = 1-200) and adding noise to each bin. Below is a graph depicting his process:
King used a total of 5 queries to accomplish this: Count of distinct taxi_ids; Count of distinct taxi_ids with k number of trips; Histogram of the proximity-shift-pca-dca feature by taxi_id; Histogram of the company-payment_type feature by taxi_id; and Histogram of fare_codes feature by taxi_id.
A dictionary is created containing a trip_seconds estimate for each pca-dca combination that is used in the post-processing process to align the fare codes with the pca-dca combinations.
To contact this team, please email Jim King at jim.king.mv [at] gmail.com (jim[dot]king[dot]mv[at]gmail[dot]com).