This is part two of a series. Missed part 1? Read it here!
What are we building?#
What do you get when you mix physics and computer vision? A demo concept “game” controlled by a plush animal! So fluffy! <3
In the first post we covered the basic elements of a game. We drew items on a canvas and programmed in some physics so our big friend can jump!
In this post, we’ll be replacing the boring clicking and tapping by computer vision! We’ll be using a camera to control our little friend in his environment.
Getting the camera#
Last time I mentioned I’m using the p5.js to create the demo. p5.js is a library with a ton of convenient functions to manipulate the DOM and draw on the canvas. Lucky for us, it also has useful methods to interact with the camera! Let’s have a look at how that looks like.
// main.js
function setup() {
let video = createCapture(VIDEO);
}
What does this createCapture do? The P5.js documentation has the following to say:
Creates a new HTML5 video-element that contains the audio/video feed from a webcam. The element is separate from the canvas and is displayed by default.
With this little piece of code, we are capturing the camera and displaying it on the screen in a video-element. Next up, we need to be able to detect if our friend is flapping it’s arms.
Teaching a machine…#
In order for our software to detect flapping arms on a plush animal, we are going to need some machine learning. Machine Learning is usually quite tricky and often looks dauntingly complex. Luckily it doesn’t have to be!
Google has an amazing website called the Teachable Machine. On this website, you can train a model to detect… anything really. Currently 3 types of projects are supported:
- Image Project: Teach the machine to detect anything in an image/video you want;
- Audio Project: Teach the machine to recognize something in audio;
- Pose Project: Teach the machine to recognize certain poses in a video;
All you really need to do is to give some examples of what you want to detect, either through uploading images or by capturing it live with your camera!
Here is a video of me training a model to detect when the arms of our buddy are open or closed. We’ll use these two states to determine when a flap (transition from open to closed) has happened. The Pose Project only works for humans (rude!), so we’ll be using an Image Project for this example.
Cut the video out of a video stream, so it was not scripted. Decided to call this good enough.
Finally, you can export the model by either uploading it to the Google servers or downloading it locally. The website even provides example code on how to use the model! It’s just an amazingly easy user experience. Be sure to try it for yourself! :-)
Integrating the model#
Time to integrate our brand new machine learning model into our game. As we are building a browser based project, we’ve exported the model with Tensorflow.js.
Tensorflow is a well know library for Machine Learning projects. While mostly known for being used in Python, Tensorflow.js is a port for Javascript. Tensorflow is great if you know what you are doing and have a good grasp of Machine Learning concepts.
My main problem with Tensorflow is that it is relatively low level, showing some of those “dauntingly complex” properties I’ve talked about earlier. To make it easy for myself, I’ll be using ml5.js as a helper library.
From the ml5.js website:
ml5.js is an open source, friendly high level interface to TensorFlow.js, a library for handling GPU-accelerated mathematical operations and memory management for machine learning algorithms.
Wait… ml5.js… that sounds similar to p5.js doesn’t it? That’s because ml5.js was heavily inspired by p5.js and is specifically made to play nice with it. That is great news for us, as we’ll be using both in this project!
Let’s look at what this looks like in code.
// main.js
let video; // setup of video is excluded here.
let flippedVideo;
let classifier;
let state = 'Closed'; // starting state of our friend.
let imageModelURL = "Your model URL" // -> either downloaded model or uploaded to Google
// Load the model first into an imageClassifier.
// The imageClassifier uses the model to classify
// images from our camera into: Open and Closed (arms)
function preload() {
classifier = ml5.imageClassifier(imageModelURL + 'model.json', modelLoaded);
}
// As loading the model can take a while, we'll use a callback function.
// Once the model is loaded, we'll start classifying every 100ms.
function modelLoaded() {
console.log('Model Loaded!');
setInterval(function(){classifyVideo()},100);
}
// Get a prediction for the current video frame
function classifyVideo() {
flippedVideo = ml5.flipImage(video)
classifier.classify(flippedVideo, gotResult);
}
// When we get a result
function gotResult(error, results) {
// If there is an error
if (error) {
console.error(error);
return;
}
// The results are in an array ordered by confidence.
newState = results[0].label;
console.log(newState);
// We consider it a jump when the state goes from
// arms Open to arms Closed :-)
if(state === "Open" && newState === "Closed") {
// trigger a jump
jump();
}
state = newState;
}
There it is! We have our little buddy ready to fly! The code examples on this page are missing some code we mentioned in part 1 of this post. I decided to make the examples focus on the task at hand. If you want the full code example, that can be found on my github.
Conclusion!#
I have been amazed by how easy p5.js makes it to work with the Canvas. There is so much good stuff in that one library I’m surprised I didn’t hear from it sooner. It seems like the perfect library to make small sketches for use in classrooms. The bar of entry is low enough to see p5.js used as a fun and simple introduction to Javascript.
The Teachable Machine on the other hand is a great way to get introduced to some machine learning. Mind you, you are not really learning anything about the inner workings of machine learning models or how to train them. I consider this a good thing, as too many developers get scared away by the dauntingly complex way models are normally trained.
Making a computer vision project this easy might just motivate people to look into machine learning. People who would otherwize shy away from it. While the models you train with Teachable Machine won’t work that well in different environment, they are a great place to start! Especially with the added abstraction layer ml5.js introduces, which makes it even easier to integrate it in your Javascript project.
p5.js + ml5.js + teachable machine = ♥
Now excuse me while me and my plushy friend go back to some more coding! If you make your own implementation with a custom model, please share it with me @TCoolsIT on Twitter! Very curious to see what you come up with!