Data Science Roundup #55: Stealing ML Models, AI in Health Care, and Talking to the Dead(?)
Probability is not subjective; now is the time for AI in medicine; reverse-engineering black box ML models; an amazing tour of Python viz options; chat bots for the deceased (creepy!); 9 strange correlations.
This week's best data science articles
This post tackles something that has bothered me for a while. What does it mean to be a “Bayesian” or a “Frequentist”? Are we choosing sides, like we choose political parties? If so, why does statistics leave room for personal preference? The author has this to say: “You don’t have to be a Bayesian to use Bayes’s Theorem. Most probability problems have a single solution considered correct under any interpretation of probability and statistics.” 👍 👍
Why is the world’s most advanced AI used for cat videos, but not to help us live longer and healthier lives? From the author: “If you’re a skilled AI practitioner currently sitting on the sidelines, now is your time to act. The problems that have kept AI out of healthcare for the last 40 years are now solvable.”
An academic paper that made me laugh! From the summary: “…we show simple, efficient attacks that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees. We demonstrate these attacks against the online services of BigML and Amazon Machine Learning.” A great response by BigML indicates that it doesn’t charge for predictions and that this paper “shows how charging for predictions is a poor business strategy”.
A startup has trained a neural network on thousands of text messages with a recently-deceased friend and is using the resulting model to power a chat bot that users can talk to as if they were talking to the deceased. Take a second and read that again: a startup is letting you emulate talking to a dead person. This article focuses less on the “how” and instead explores the human dimension of what this type of technology means for all of us. Creepy and fascinating.
Useful. The author goes through five different plotting scenarios in each of five different Python plotting libraries: matplotlib, pandas, Seaborn, ggplot (an apparently-very-solid Python port), and the new kid on the block, Altair. Each scenario shows code and results for each library as well as the author’s narration on strengths and weaknesses of each library’s approach. This is a must-read if you do any Python visualization.
Did you know the data shows that:
smart people like curly fries,
female-named hurricanes are more deadly, or
typing with proper capitalization indicates creditworthiness?
This article shares nine facts and then explains why the obvious conclusion is probably wrong. It turns out the curly fries correlation is explained by homophily on Facebook and not an underlying causative relationship of IQ to french fry preferences.
Data viz of the week
Shipping around Hurricane Matthew as it passed near Jacksonville Florida
Thanks to our sponsors!
Fishtown Analytics is a boutique analytics consultancy serving high-growth, venture-funded startups. Have analytics questions? Let’s chat.
Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.
The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.
If you don't want these updates anymore, please unsubscribe here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue
915 Spring Garden St., Suite 500, Philadelphia, PA 19123