machine intelligence | Joshua Bloom

Serverless Distributed Decision Forests with AWS Lambda

Within the Wise.io team in GE Digital, we have monthly "edu-hackdays" where the entire tech team spends the entire day trying to learn and implement new promising approaches to some portion of our machine-learning based workflow. In the past, we worked on algorithm hacks and on methods for distributed featurization. Some of what we start those days eventually go into production, but most does not. The main goal (apart from the team building that comes with the fun and pain of all-day hacks) is to create collective knowledge and experience around important components of our stack. Recently we had an edu-hackday on strategies for distributed learning. This post captures (and hopefully provides some motivation for) the work I did at that hackday in April.

Towards Cost-Optimized Artificial Intelligence

If accuracy improves with more computation, why not throw in more time, people, hardware, and the concomitant energy costs? Seems reasonable but this approach misses the fundamental point of doing machine learning (and more broadly, AI): as a means to an end. And so we need to have a little talk about cost-optimization, encompassing a much wider set of cost-assignable components than usually discussed in academia, industry, and the press. Viewing AI as a global optimization over cost (ie., dollars) puts the work throughout all parts of the value chain in perspective (including the driving origins of new specialized chips—like IBM TrueNorth Google's Tensor Processing Unit). Done right it will lead to, by definition, better outcomes.

Cache Ugly Reporting Queries With Materialized Views and Docker

Confidence and trust in your SaaS product depends, in part, on the continual conveyance of the value of the solution you provide. The reporting vectors (web-based dashboards, daily emails, etc.) obviously depend upon the specifics of your product and your engagement plan with your customers. But underlying all sorts of reporting is the need to derive hard metrics from databases: What's the usage of your application by seat? How has that driven value/efficiency for them? What are the trends and anomalies worth calling out? The bad news is that many of the most insightful metrics require complex joins across tables; and as you scale out to more and more customers, queries across multitenant databases will take longer and longer. The good news is that, unlike for interactive exploration and real-time monitoring and alerting use cases, many of the queries against your production databases can be lazy and done periodically. At Wise.io, we needed a way to cache and periodically update long-running/expensive queries so that we could have more responsive dashboards for our customers and our implementation engineers. After some research, including exploration with 3rd party vendors, we settled on leveraging materialized views. This is a brief primer on a lightweight caching/update solution that uses materialized views coupled with Docker.

A Test for Artificial Creativity

I just posted on Medium a blog about using crosswords as a Turing-like test for artificial creativity. It just happens to coincide with the week of the 100th anniversary of the crossword puzzle and Alan Turing's pardon. Read on…

Is there an Uncanny Valley of Machine Intelligence?

Roboticists know a lot about the uncanny valley, that uncomfortable place in utility and appearance where robots look and act almost—but not exactly—lifelike. On one side of the valley diligent self-propelled vacuum cleaners make our domestic lives easier and on the other side (the stuff of sci-fi for now) is the promise of human-replicants doing manual work than no real human could do or would want to for the pay involved.