Bots Bots Bots...

Although bots exist almost since the internet was invented, seems like recently it became something everyone is excited about. But bots are just apps, running somewhere on the internet. Just like a web service- it gets input, do some processing, and returns the result.

 

When I just started using the internet, when browsers didn’t exist and we used telnet to connect to a remote unix shell (yeah- I'm old), I remember the bots that played the cops keeping the order in the IRC channels. If someone flooded the channel with messages or misbehaved- the bot kicked him and blocked him from the channel.

 

One of the main technologies that are responsible for the hype around bots and its increased use, is the recent development of Machine Learning and Neural Networks technologies. These technologies powers new cognitive services that enable making bots much smarter, and even more "sensitive". Think of a bot that analyses a user's post or email and uses sentiment analysis services to "understand" if the user is happy, sad, angry or satisfied- just by "reading between the lines". The bot can then react and provide its response considering the user's' mood.

 

Another thing that might be connected to this hype- is the variety of platforms from which we can connect to the bot: Skype, Facebook Chat, Slack, email, SMS, and more. Basically you can connect your bot to any platform. The problem is that if you implement your bot independently and want users to be able to access it from different platforms, you'll have to implement each of these connections yourself.

 

To help overcome this problem, several Bot Frameworks were released recently, such as Microsoft Bot Framework. The main benefit of using such framework, in addition to the designated SDK and the different features, is the ability to connect your bot to different channels very easily by just going through a few simple on-boarding steps. The framework is responsible to route the messages from the various channels to your bot, and is also responsible to deliver the bot's response in the optimal way each channel supports, releasing the developer from the responsibility of handling the differences between the platforms. For example, in a web chat the bot can ask you to click a button while in a text channel it will present a list of options for you to choose from.

 

Bots are not the solution to all humanity problems. We should think very carefully whether the scenario that we're thinking to implement truly makes sense. We should ask ourselves whether the bot is going to provide a better experience than other alternatives.

 

If you decide to provide bot experience to your platform, keep the following in mind:

 

Add analytics to your bot- One of the most important thing to do when creating your bot, is to understand how well it performs. Recently I worked with a partner on creating his Q&A bot. In this case, it was very important to understand whether the bot provided the right answer, and whether the answer actually helped the user. Since the user types free text questions, we needed to be able to map a specific answer to various versions of the same question. We used simple search techniques to search for the question, and provided a confidence level for each answer. If the confidence level was lower than a specific threshold, we provided a disclaimer that we're not sure, asking whether the answer was helpful. The user could then provide his feedback. This way, we could improve the bot accordingly. It was easy for us to understand where the bot was performing well, and where we needed to invest more in the logics.

 

Create domain specific bots- Natural language understanding services help your bot understand what the user meant by extracting the user's intention from the text. But, in order to perform well, these services should be trained on a variety of examples, since each intent might be linked to a lot of different ways the user can express it. For this reason, if you use a Natural Language service to extract intents, it is very important to narrow down the number of different intents your bot supports. If you try to support too many intents, the language understanding service will return results with a relatively low confidence since the probability to do mistakes is higher.

 

Guide the user of what he can do at each step- It is a good practice to guide the user what the bot expects at each step, by providing examples and controls (such as buttons instead of expecting free text), so that the probability of understanding the user and being able to complete the task increases. For example, instead of asking "What would you like to buy?" you can ask "In which category are you interested today?" and provide a list to choose from. Instead of asking "Are you sure?" and expecting a yes/no response, you can provide buttons the user can just click. This way- step by step, you can guide the user what's next, and complete a successful scenario. 

 

Have fun with your bot!

Controlling your backend platform using an extensible web-based command line interface

If you are a developer who likes using command line tools, you have come to the right place!

One of the projects I worked on, was a platform for hosting Node.js apps in the cloud (aka anode). The platform is deprecated, now that all of the major cloud companies host Node.js apps natively. BUT, as part of working on this project- I developed a web-based command line interface console which served as an entry point to track, manage and control our backend platform.

Instead of developing a dedicated administration UI we developed this console, which was designed to be extensible so that new commands can be added by implementing a simple REST API that exposes the required actions. 

In the past year, since I joined the Microsoft Partner Catalyst team, we have used the console for every project that we worked on. On each project, the first thing we did was set up the console, and only then did we start working on the actual project. This allowed us to control our backend from day 1. We were able to view and query application logs immediately, and we added more and more commands into the console as needed. Because of that, we decided to open-source this framework, so that every one can leverage it.

The console is designed for technical users who have experience with (and feel comfortable) using a command line environment. It can be used as the only tool to administrate your platform or as a quick solution during early stages of development, in parallel to developing a dedicated UI.

I'm leaving out all of the technical details. You can find all the information you need in the web-cli component GitHub repository. 

In addition, I created a sample bootstrap application that can help you get started immediately. The sample app includes out of the box features such as authentication, authorization, user management and a log viewer. Some of these features are implemented as plugins, so that you can use it as a reference for how to extend the console with your own custom commands. More details can be found in the docs folder.

The console in action!

The console in action!

"The attack of the clones" demo: multiple command line "windows"

"The attack of the clones" demo: multiple command line "windows"

I would love to get your feedback.
Hope you enjoy using it at least as much as I do :-)

 

Video Tagging Tool for Video-Processing and Image Recognition

The drone industry is a fast growing space with more and more players joining this field in a rapid pace. Many drone applications, in particular, autonomous drone ones, require vision capabilities. The camera feed is being processed by the drone in order to support tasks such as tracking, navigating, filming, and more. Companies that are using video processing and image recognition as part of their solutions, need the ability to train their algorithms to identify objects in a video stream. Whether it’s a drone that is trained to identify suspicious activities such as human presence where it shouldn’t be, or a robot that is navigating itself indoors following its human owner- they all need the ability to identify and track objects around them. In order to improve the recognition/tracking algorithms, there’s a need to create a collection of manually tagged videos, which serves as the ground truth.

 

The tagging work today is very sisyphean manual work that is done by humans (e.g., Mechanical Turk). The Turks are sitting in front of the video, watching it frame by frame, and maintaining excel files describing each frame in the video stream. The data that is kept for each frame is the frame’s number, which objects are found in the frame, and the area in the frame in which these objects are located. The video and the tagging metadata is then used by the algorithm to learn how to identify these objects. The tagging data that was created manually by a Turk (the ground truth), is also used to run quality and regression tests for the algorithm, by comparing the algorithm’s tagging results to the Turk’s tagging data.

 

Exploring the web for existing solutions didn’t bring great results. While larger companies in the video processing space are developing their own internal tools for doing the tagging work, the small startups that don’t have the bandwidth to invest in developing such tools (which are also not part of their IP) are doing manual work, tagging the videos frame by frame using excel. We couldn’t find any tool that came near something that we could easily just take and use.

 

Solutions similar to Mechanical Turks have the following limitation, which brought us to the decision of developing this tool:

  • Lack of or poor video tagging support, there aren’t good (OSS) tools today.
  • High quantity over high quality.
  • Sometimes there’s a business need to keep those videos confidential, and allow tagging by trusted people only.

 

We (@Microsoft) engaged with two Israeli startups to work together on a tool that will provide these basic needs. A case study covering this engagement, including links to the code and tool can be found here.

 

Sample Screenshot from the tool: Tagging the video frames

  • Navigating frame by frame
  • Select areas on the frame, and tags for each area
  • Video controls
  • Review a tagging job
  • Send a job for review by an Admin, or approve the tagging job

 

 

Migrating Parse Push Notifications to Azure Notification Hubs

Recently I worked with one of Microsoft's accelerator startups to migrate their Parse application to Azure.

We followed these instructions to move their hosted Parse application to a Node.js express app very easily, running as an Azure Web App.

As part of this process, we also explored ways to migrate the Push Notifications service from Parse to Azure Notification Hubs. In this process we developed parse-to-anh which is a Node.js module that can be used as a command line tool or as a module as part of a larger migration script.

The tool currently supports migrating ios and android users.

Note for Android users- One thing you should be aware of is that Parse uses their own GCM SenderId to send push notification to android users. The intention was to hide the hassle of on-boarding with Google Push Notification service. For you to be able to migrate these users, you'll need to follow these instructions (look for "Exporting GCM Registration IDs"), to create your own SenderId and to update your existing app to register the devices against Google, so that on migration time, the token will match your SenderId. This should be done ASAP- so that hopefully most of your users will get the app update and will register with your SenderId.

Good Luck!

 

 

 

 

3D Obstacles Modeling for drones Sense&Avoid using Node.js and Spatialite

How would you model 3D objects like buildings, bridges and electricity wires? How would you query this data to support your Sense&Avoid solution for your drone?

In this post, we'll discuss an approach to model and query obstacles. Using their real-time video processing capabilities, drones can update the model when identifying new objects or mapping an area. In addition, drones can query for potential obstacles in their pre-compiled route, and reroute accordingly.

To make this discussion relevant to real world scenarios, I chose to work with Toronto's publicly available model, called 3D Massing. You can read about its release earlier this year here. Also, a great interactive visualization of this model can be found here.

In terms of technology, I'm going to work with Node.js and Spatialite, which is a version of Sqlite with extensions for manipulating geometry data. I chose these technologies because it's easy to write a REST API service and run it everywhere. We can host this service in the cloud and we can run it locally on the device (the drone).

A possible scenario for drones will be to run the same service locally with a subset of the data on the device. That way, drones can download part of the data for the area they operate in before taking off. Let's assume that during flight a drone identifies that a specific part of the route is blocked by an unrecorded obstacle. In this case, a short response time is essential and the re-route should be calculated as fast as possible. Response time will be faster when querying the local service than querying a remote REST API.

As mentioned earlier, for the modeling part I'm going to use Toronto's publicly available shape file. In order to use this model with Spatialite, I used a tool called QGIS which is a free and open source geographic information system. I opened the downloaded shape file, and exported it to a Spatialite DB file format.

Toronto 3D Massing Model

Exporting the shape file

Exporting the shape file

Saving file as a Spatialite DB file

Saving file as a Spatialite DB file

The provided shape file uses a Coordinate Reference System (CRS) to map geographical entities. You can read more about how the globe is split to different zones here.

I'm not going to go too much into details, but just a short explanation about how shapes are being stored in a Spatialite Geometry column type. So a Geometry value can be a Point, a Line (which is a collection of points), a Polygon (which is a collection of lines), etc. A point is composed of 2 values, for X and Y which are the location of the point on the area, relative to the CRS.

The objects in the shape file are basically Polygons or MultiPolygons describing the layout of the shape (building, bridge, etc). In addition to the 2D layout, each shape also includes an altitude. For the purpose of this post I used the Elevation, which is the building height from Mean Sea Level.

Spatialite provides APIs to do manipulations and calculations on Geometry type fields, like finding out which geometries intersect others,  which cross others, etc. When we want to find all objects in a specific area we can use these functions, but this will result in a full-table scan. Spatialite will go over all records and for each - it'll do the specific calculation to find if it satisfies the query criteria.

We want to be able to query for objects in specific areas so we need the ability to query based on a specific location. Sqlite index cannot be used against a Geometry type column. This is because regular index tables in Sqlite are based on BTree (i.e. Binary Tree) which can usefully be applied on numbers and / or strings, but not on geometries. For geometries, we need a different kind of index, i.e. the so called RTree (Rectangle Tree). Spatialite uses a RTree implementation to create Spatial Index on Geometry type columns.

As part of exporting the shape file to Spatialite, it also generated a Spatialite RTree index table. We can see that for each object in the data table, there is a referencing record in the index table describing its "boundary box": (xmin, ymin) and (xmax, ymax). Using this info, we are able to filter only objects in the area we're interested in by a simple "inner join". We can now apply the Spatialite APIs on this subset of shapes to get those we were looking for.

The following is a tool called spatialite-gui which you can use to manage Spatialite DBs:

Spatialite GUI Tool

Spatialite GUI Tool

Before we dive into the code, let's go over the approach. This is just an example and you can tune it the way you feel right. Let's assume that we would like to know if the following route at the height of 100 meters is free of obstacles.

This route is composed of the following 3 points:

Route Line: [-8848015, 5425525], [-8845603, 5426088], [-8845543, 5425149]

Route Line: [-8848015, 5425525], [-8845603, 5426088], [-8845543, 5425149]

The route's layout in Spatialite Geometry Explorer

The route's layout in Spatialite Geometry Explorer

We can see that it is composed of 3 vertices. Since we want some safe distance between our drone to the objects around it, we'd like to create some kind of a tunnel around this route. We'll use the Buffer function which adds extra space around the route (50 meters in this case):

Using the Buffer function to pad the route with a "safe distance"

Using the Buffer function to pad the route with a "safe distance"

Adding the extra buffer resulted in a 158 vertices tunnel.

Since the calculations get more complex the more vertices we have, let's try to reduce the complexity of the Geometry, but still keep the necessary info. We can do this using the SimplifyPreserveTopology function that decreased our shape's "resolution" to 15 vertices and still keeps its topology:

Simplified Geometry

Simplified Geometry

In our query, we're looking for objects that overlap this shape, but we don't want to full-scan all shapes in our table. We want to filter only those shapes that are in the area we're flying added to some buffer that we get in the input parameters.

To find the relevant area, we'll use the following query:

Find the relevant area

Find the relevant area

Envelope gets a bounding box for the original route. We add some buffer (100 meters in this case) and then we're getting another envelope to reduce the number of vertices. Now when we author our query, we'll be able to first look for all objects in this area, and apply the actual Overlap function on them. In addition, we'll also add the height/altitude condition to the mix.

 

It's time to make our hands dirty. The related code can be found here.

Most of the Node.js code is just a wrapper around the Spatialite query, exposing it as a POST /obstacles REST API.

 

The API gets the following JSON object:

{
  "droneHeight": 203,
"safetyDistanceHeight": 10,
"safetyDistanceSides": 10,
"buffer": 500,
"lineRoute": [
  [-8848015, 5425525], 
  [-8845603, 5426088], 
  [-8845543, 5425149]
  ],
"useSpatialIndex": true,
"debug": true
}

and returns the following output:

{
"obstacles": [
{ "id": 94904, "Z":193.3188, 
  "centerX": -8848353.258156676, 
  "centerY": 5425583.826544509 
  },
{ "id": 125365, "Z":203.0068, 
  "centerX": -8847628.696021229, 
  "centerY": 5425971.979974609 
  }
],
"debugInfo": { 
"request": { the request parameters },
"query": { the SQL query executed }",
"duration": { time in msec to execute the query }
}
}

Use the debug parameter to get the debugInfo block in the response. The debugInfo block includes the original request parameters, the query that was generated and the execution time.

This is the actual query:

 

Example request:

We can see that the route is free of obstacles. Now let's change our safe distance from 10 to 315, and also enable debugging information:

This time we can see that there's an obstacle in that route. If we look carefully, we can see that there's a small object (in the red circle) that is relatively close to the route. The more we increase the safe distance- the more objects we will get.

Obstacle identified within safe-distance

Now just for the sport, let's try to not use the spatial index table in our query, and see how long it takes for the query to run:

Same query without using the spatial index

Same query without using the spatial index

We get the same result, only this time it takes nearly 10 (!!) seconds because Spatialite performs a full table scan without using the Spatial index table.

 

Next step will be to use this service as part of a routing algorithm that calculates obstacles-free routes.  

How to store and retrieve spatial indexed objects in a non spatial-capable databse

Easily storing and retrieving objects based on coordinates requires a geometry spatial-capable storage solution. But what do we do when our storage solution doesn't support spatial index?
 
In this post I describe one possible solution that permits developers to store and retrieve geo-data information in a table (like Azure table) or in memory (hash/dictionary or any key/value data structure).
 
This approach divides our area, world, or canvas to grids (or buckets) such that each bucket gets a unique key that can be used as the grid ID  indexed in our storage. 

For example let us imagine that we divide a space into buckets each of which is 30 meters long and 30 meters wide. All objects in this space are mapped to a bucket Id. The exact coordinates of each object should also be stored as part of the object’s properties. These precise coordinates can later be used to determine the object’s exact location and, second, to filter objects when querying all objects within a specific rectangular or circular area.
 

Spatial mapping of objects to buckets

Spatial mapping of objects to buckets

One of the benefits of using this approach is that solutions such as Azure Table cost less than  spatial-capable storage solution like SQL and MySQL. Of course this approach comes with a "cost" that filtering and additional computation is needed on the client side.
 
If you're using Node.js, the spatial-mapping module used in this post can help you map objects to buckets very easily. Otherwise, implementing this mapping logic in other languages should still be fairly straightforward.

For example, let’s consider Azure table storage.  In Azure table all rows require a cluster key and row key.  Accordingly if you assign the bucket ID as the cluster key and the object ID as the row key it’s therefore possible to query all objects with a specific cluster key and receive all objects within a bucket. An additional advantage of this setup is that all objects within the same bucket will also be stored in the same storage node.  By default Azure stores all objects with same cluster ID within the same storage node. 
 
Getting all users in a specific area (rectangle or a circle) should be done by finding the bucket ids of all buckets inside this area (whole grid or part of it). Then, on the client side, we can apply a filter function to filter objects that are not in the specific area we're interested in by using a simple mathematical condition.

Multiplayer Online Game using Node.js and Redis

Disclaimer: this post does not intend to provide a highly scalable solution for developing a massive multiplayer online game.  The hypothetical scenario described here should be able to support a few thousand users. Existing online gaming frameworks take care of the heavy infrastructure considerations by implementing proven design patterns.  These battle-tested abstraction frameworks allow developers to concentrate on implementing their game's specific logic.

Designing a Massively Multiplayer Online (MMO) game is not trivial. Latency should be relatively very low, usually less than 200ms, and preferably less than 100ms. Players and objects are moving constantly, and these changes should be available to all clients. After all, you wouldn't want to be "killed" because your location wasn’t updated as fast as an axe thrown by your opponent.

In a game with thousands or hundreds of thousands of users, you'll probably need multiple frontend servers to manage the client connections and continuously route game events to the clients.

In this post I'll create a hypothetical game with players in constant movement  in the game world. Since I'm targeting novice developers, we're not going to use any of the existing gaming frameworks. I want to keep things simple, so that novice gaming developers can start creating multiplayer online games without needing to master a gaming framework. 
 
Requirements:

  1. Latency - no greater than 200 ms for an event to reach relevant clients from the server. If the latency is too high  events aren’t properly synchronized  and  become awkward from the  users' perspective.
  2. World- there are no different worlds, areas, or rooms in our game. All users move in the same, big space. The users should see all users or objects in their vicinity —whatever objects enter into their view. The user’s view, is the part of the world that is visible to the user.
  3. Code - we’re avoiding any MMO stack or gaming framework as described above.
MMO Game Simulation

MMO Game Simulation

This image simulates our "world" in which the users are randomly positioned. The red and blue borders around the green circle user represents the user’s "view" (the visible portion of the world). The space between the red and blue borders represents the space beyond the user's view where the user will still receive  updates, such as players entering or exiting the user’s view.
 
MMO games are nothing new and there are few frameworks  available that implement the actor design pattern in Java, .NET and other languages. The common implementation of these actor design pattern frameworks is that they keep the actor instances (AKA Activations) and their state in-memory. The state is spread on all the servers. The framework also abstract the underlying communication between the servers, so that if two players are activated in two different physical machines, they will be able to communicate one with the other. These frameworks also handle other common scenarios like servers failures, reactivation, load balancing, and much more.
 
Since we are targeting novice gaming developers, I'm going to use Node.js for implementing this the back-end service for this example. Javascript is a great programming language for new developers and Node.js lets us run javascript on the server side. By the way, a gaming framework, Pomelo.js, already exists for Node.js. Nevertheless as we mentioned above we’re avoiding gaming gaming frameworks in this example.
 
Using an actor design framework requires a large amount of RAM machines so that the service can maximize the amount of users' state retained. My approach is to use one big cache server (like Redis) to hold all of our users' state.  Additional information is available  here regarding which Redis configuration to use for your scenario

Proposed solution layout:


  1. Clients connect to our front-end servers using websockets enabling the user to send and receive events to and from the servers.
  2. Node.js service keeps the open clients' web socket connections. This service will run on each of our front end machines. We'll use Node.js cluster feature to fork processes according to the number of cores available in our VM. Since our service does not contain state we don’t need to worry about affinity. 
  3. We'll use the socket.io node module to manage websocket connections.
  4. Redis setup: we'll implement 2 main features:
    1. Caching - Redis will be our sole state server, retaining all users' state.
    2. Pub/Sub - using the publish and subscribe mechanism to synchronize all of the servers with events coming from clients. This feature will be used to broadcast location change events from a client connected to one server to all of the clients connected to the rest of the servers.
  5.   We will run our solution locally but when we come to publish our game we can use an Azure VMs (either Windows or Ubuntu machines). We can start with one and scale out with more instances according to our needs.

The number of users we can support is heavily dependent on the nature of our game. Some important important considerations include: the size of the world, the size of the view, the density and the way users are spread in the world,  the number of events each client sends and the frequency, etc. One front end (FE) server might support enough users for our purpose in one configuration while in other game scenario might require additional FE servers.
 
In our approach- one Redis machine quickly becomes a bottleneck as additional users are added to the game. As well since  Redis is single-threaded, at some point, it will be inevitable to switch to a solution where the state is either clustered (Redis cluster) or split between servers using an existing framework as described above.
 
A fully implemented code can be found here. You're welcome to review the code. Instructions  regarding how  to install and setup the application to run locally on your machine