In a previous post we discussed conversational searching and how this is increasing with the development of voice search application programming interfaces (API). But there are other mechanisms of search available that are changing the quality of information we receive.
Microsoft have released a set of Cognitive Search APIs that provide developers a wealth of resources to push media search to a new level. When searching on the likes of Bing or Google for media (image or video) users are presented with several simple filters. These filters can include adult classifications, dominant image colour and in some cases, the content of the image (i.e. people or animals).
In this blog series, we’ll introduce various APIs and our thoughts on the applications so stay tuned.
Computer Vision API
The Computer Vision API aims to extract information from images to categorise and detail the content. Microsoft believe that by doing this, they can better service media searches and protect users from unwanted content while also enabling developers to push the boundaries of media services through visual data.
First let’s cover the recognition of people. During our own internal testing, the API did a fair job of identifying people in photos while providing an estimation on the age and gender.
The image on the left was mapped through the API and output the image on the right. It clearly identified the two people and overlaid the age and gender. Additional to the overlay, the API then provided additional information and scores the image based on a level of confidence.
What does the API think?
- 99% confident that the photo contains people.
- 75% confident that those people are standing.
- Not classed as Adult or Racy content (1% chance of being adult content).
- Identified two males with ages within four years of actual age.
As well as providing some data insights, it also tried to provide a description of the image which in this case was “a group of people stood around”.
Let’s try throwing a few more people into the mix to see how the API performs.
In the image below the API has accurately identified the gender of five people and has estimated age within a few years. As with the image above, other elements have been identified.
What does the API think?
- 99% confident that the photo contains people and was taken outdoors.
- 79% confident the photo contains brickwork.
- 54% confident the subjects are posing. (Certainly not!)
For this image, the API described it as “a group of people stood in front of a brick wall” which is very clearly accurate.
Continuing with human subjects, another feature of the API is the ability to identify a celebrity. We tested a few of these however the API struggled with multiple people in one image. Where a single person was in the image, there was a good level of accuracy although the API was not as confident.
We fed a picture of Winston Churchill through the API and it managed to identify that not only was it Winston Churchill but that he was wearing a hat too. Despite that, it was only 47% confident on this being the case.
Moving away from human subjects, we also ran two other images through the API with a view of identifying objects. The level of accuracy was quite astonishing.
In the image below, there was a 98% confidence score that the photo was taken indoors. But more interestingly the API was 95% confident that the photo contains writing implements and/or stationary items. I would say that given that’s a pot of pencils, this is quite accurate.
Let’s really mix it up and throw people and objects into the mix combined with partially visible faces.
We were quite impressed by the quality of the response to this image. Here is what the API thinks:
- 99% confident that the photo contains a kite and is outdoors
- 89% confident of a beach setting
- 23% confident the photo is a family photo (this one is very interesting)
It then goes on to describe the photo as “a group of people flying a kite on the beach”.
Additional to the above, the API can…
- …identify text in several languages within an image and both display and translate it. This is already used in the Microsoft Translate application and does a very good job.
- …identify a focus object within an image and create a series of thumbnail images based around that object. This works well where there is a single focus such as a person or item (i.e. a pot of pencils).
- …analyse live video and feedback on the content (i.e. identify brick walls and beaches).
How could this technology be used?
APIs allow others the opportunity to use the technology within their own websites and software. So here are some applications we believe we are going to see:
- An image sharing website with a broad demographic could use the API to automatically identify image’s content and score it. Any deemed to contain unwanted content or adult and racy material could be flagged for review.
- Browsers could utilise the API to identify the age and gender of users and control the available content to that user.
- As an increasingly younger generation are using mobile phones, the technology could be used to identify viewed or captured images and then notify a parent or carer.
- A marketer could use the API to identify the age and gender of a target audience and display a bespoke marketing ad to them.
Here at Enjoy Digital we are always looking for new and interesting ways to gather data and this API is certainly of interest. We will be watching it closely and will bring more information as the technology matures but from the team here at Enjoy Digital, we have to say a massive well done to Microsoft for the release of these API’s.