Your 2018 Conference List. SageMaker. Docker. Dashboard Design @ Instagram. [DSR #114]

❤️ Want to support us? Forward this email to three friends!

🚀 Forwarded this from a friend? Sign up to the Data Science Roundup here.

Introducing Amazon SageMaker

IMO, SageMaker was the biggest thing to come out of Re:Invent this week for data scientists. The one-liner on it is:

Amazon SageMaker is a fully-managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models at any scale.

At the most basic level, it’s a hosted Jupyter notebook. But it’s natively integrated into the AWS ecosystem in ways that make it more powerful than just running a local notebook.

We’ve played around with this a bit in the past couple of days and definitely see SageMaker’s potential. Very worth your time to check it out.

52 Machine Learning and AI Conferences

It’s time to put together your conference schedule for 2018! The post includes dates, links, and details.


Docker for Data Science

If you’re not familiar with Docker, you should spend some time with it.

Think of Docker as a light virtual machine . Someone writes a Dockerfile that builds a Docker Image which contains most of the tools and libraries that you need for a project. You can use this as a base and add any other dependencies that are required for your project. Its underlying philosophy is that if it works on my machine it will work on yours.

I picked up Docker basics in a day a couple of years ago and have found it to be a very useful tool in my kit. This article is good jumping-off point.


How to Improve my ML Algorithm? Lessons from Andrew Ng’s Experience

I love this post. It’s fairly short but provides super-useful and actionable advice for of your data science model optimization. My favorite is the F1 score:

Rather than using two numbers, precision and recall, to pick a classifier, you just have to find a new evaluation metric that combines precision and recall. In the machine learning literature, the standard way to combine precision and recall is something called an F1 score.


Three Common Mistakes With Company-level Dashboards

File this under “obvious stuff that people still suck at”. When you publish a dashboard for consumption throughout your company, you are not just a data analyst, you are also a designer. All of those people will interact with your dashboard with all of the human quirks and foibles that they interact with all of their other information, and it is your job to design for that.

This post presents three simple rules that you may very well not be following today. It’s written by the head of marketing analytics @ Instagram, so he knows a bit about the topic.

Highly recommended.


DeepMind: Population based training of neural networks

The newest from DeepMind:

The success of a neural network at a particular application is often determined by a series of choices made at the start of the research, including what type of network to use and the data and method used to train it. Currently, these choices - known as hyperparameters - are chosen through experience, random search or a computationally intensive search processes.

In our most recent paper, we introduce a new method for training neural networks which allows an experimenter to quickly choose the best set of hyperparameters and model for the task. This technique - known as Population Based Training (PBT) - trains and optimises a series of networks at the same time, allowing the optimal set-up to be quickly found.


7 Python Data Science Influencers to Follow

This is the Year of Python—Python overtook R in usage among data scientists and became the most-visited tag on Stack Overflow. The folks at Mode Analytics put together a list of the top Python influencers on Twitter; if you’re not following any of these folks, check them out.


Thanks to our sponsors!

Fishtown Analytics: Analytics Consulting for Startups

At Fishtown Analytics, we work with venture-funded startups to implement Redshift, Snowflake, Mode Analytics, and Looker. Want advanced analytics without needing to hire an entire data team? Let’s chat.


Stitch: Simple, Powerful ETL Built for Developers

Developers shouldn’t have to write ETL scripts. Consolidate your data in minutes. No API maintenance, scripting, cron jobs, or JSON wrangling required.


By Tristan Handy

The internet's most useful data science articles. Curated with ❤️ by Tristan Handy.

Tweet Share

If you don't want these updates anymore, please unsubscribe here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

915 Spring Garden St., Suite 500, Philadelphia, PA 19123