The drone industry is a fast growing space with more and more players joining this field in a rapid pace. Many drone applications, in particular, autonomous drone ones, require vision capabilities. The camera feed is being processed by the drone in order to support tasks such as tracking, navigating, filming, and more. Companies that are using video processing and image recognition as part of their solutions, need the ability to train their algorithms to identify objects in a video stream. Whether it’s a drone that is trained to identify suspicious activities such as human presence where it shouldn’t be, or a robot that is navigating itself indoors following its human owner- they all need the ability to identify and track objects around them. In order to improve the recognition/tracking algorithms, there’s a need to create a collection of manually tagged videos, which serves as the ground truth.
The tagging work today is very sisyphean manual work that is done by humans (e.g., Mechanical Turk). The Turks are sitting in front of the video, watching it frame by frame, and maintaining excel files describing each frame in the video stream. The data that is kept for each frame is the frame’s number, which objects are found in the frame, and the area in the frame in which these objects are located. The video and the tagging metadata is then used by the algorithm to learn how to identify these objects. The tagging data that was created manually by a Turk (the ground truth), is also used to run quality and regression tests for the algorithm, by comparing the algorithm’s tagging results to the Turk’s tagging data.
Exploring the web for existing solutions didn’t bring great results. While larger companies in the video processing space are developing their own internal tools for doing the tagging work, the small startups that don’t have the bandwidth to invest in developing such tools (which are also not part of their IP) are doing manual work, tagging the videos frame by frame using excel. We couldn’t find any tool that came near something that we could easily just take and use.
Solutions similar to Mechanical Turks have the following limitation, which brought us to the decision of developing this tool:
- Lack of or poor video tagging support, there aren’t good (OSS) tools today.
- High quantity over high quality.
- Sometimes there’s a business need to keep those videos confidential, and allow tagging by trusted people only.
We (@Microsoft) engaged with two Israeli startups to work together on a tool that will provide these basic needs. A case study covering this engagement, including links to the code and tool can be found here.
Sample Screenshot from the tool: Tagging the video frames
- Navigating frame by frame
- Select areas on the frame, and tags for each area
- Video controls
- Review a tagging job
- Send a job for review by an Admin, or approve the tagging job