Computer Vision API

Extract rich information from images to categorize and process visual data – and machine-assisted moderation of images to help curate your services.

Analyze an image

This feature returns information about visual content found in an image. Use tagging, descriptions, and domain-specific models to identify content and label it with confidence. Apply the adult/racy settings to enable automated restriction of adult content. Identify image types and color schemes in pictures.

See it in action

Gender	Male
Age	36

Browse

Feature Name:	Value
Description	{ "Tags": [ "water", "swimming", "sport", "pool", "person", "man", "frisbee", "ocean", "blue", "bird", "riding", "top", "standing", "wave", "young", "body", "large", "game", "glass", "pond", "playing", "board", "catch", "clear", "boat", "white" ], "Captions": [ { "Text": "a man swimming in a pool of water", "Confidence": 0.8909298 } ] }
Tags	[ { "Name": "water", "Confidence": 0.9997857 }, { "Name": "swimming", "Confidence": 0.955619633 }, { "Name": "sport", "Confidence": 0.953807831 }, { "Name": "pool", "Confidence": 0.9515978 }, { "Name": "person", "Confidence": 0.889862537 }, { "Name": "water sport", "Confidence": 0.664259 } ]
Image format	"Jpeg"
Image dimensions	462 x 600
Clip art type	0
Line drawing type	0
Black and white	false
Adult content	false
Adult score	0.07518345
Racy	false
Racy score	0.1814024
Categories	[ { "Name": "people_swimming", "Score": 0.98046875 } ]
Faces	[ { "Age": 36, "Gender": "Male", "FaceRectangle": { "Top": 133, "Left": 298, "Width": 121, "Height": 121 } } ]
Dominant color background	"White"
Dominant color foreground	"Grey"
Accent Color	#19A4B2

Want to build this?

Documentation

Read text in images

Optical character recognition (OCR) detects text in an image and extract the recognized words into a machine-readable character stream. Analyze images to detect embedded text, generate character streams, and enable searching. Take photos of text instead of copying to save time and effort.

See it in action

Browse

Preview
JSON

IF WE DID

ALL

THE THINGS

WE ARE

CAPABLÉ•

OF DOING,

WE WOULD

LITERALLY

ASTOUND

QURSELV*S.

{
  "TextAngle": 0.0,
  "Orientation": "NotDetected",
  "Language": "en",
  "Regions": [
    {
      "BoundingBox": "316,47,284,340",
      "Lines": [
        {
          "BoundingBox": "319,47,182,24",
          "Words": [
            {
              "BoundingBox": "319,47,42,24",
              "Text": "IF"
            },
            {
              "BoundingBox": "375,47,44,24",
              "Text": "WE"
            },
            {
              "BoundingBox": "435,47,66,23",
              "Text": "DID"
            }
          ]
        },
        {
          "BoundingBox": "316,74,204,69",
          "Words": [
            {
              "BoundingBox": "316,74,204,69",
              "Text": "ALL"
            }
          ]
        },
        {
          "BoundingBox": "318,147,207,24",
          "Words": [
            {
              "BoundingBox": "318,147,63,24",
              "Text": "THE"
            },
            {
              "BoundingBox": "397,147,128,24",
              "Text": "THINGS"
            }
          ]
        },
        {
          "BoundingBox": "316,176,125,23",
          "Words": [
            {
              "BoundingBox": "316,176,44,23",
              "Text": "WE"
            },
            {
              "BoundingBox": "375,176,66,23",
              "Text": "ARE"
            }
          ]
        },
        {
          "BoundingBox": "319,194,281,44",
          "Words": [
            {
              "BoundingBox": "319,194,281,44",
              "Text": "CAPABLÉ•"
            }
          ]
        },
        {
          "BoundingBox": "318,243,181,29",
          "Words": [
            {
              "BoundingBox": "318,243,43,23",
              "Text": "OF"
            },
            {
              "BoundingBox": "376,243,123,29",
              "Text": "DOING,"
            }
          ]
        },
        {
          "BoundingBox": "316,271,170,24",
          "Words": [
            {
              "BoundingBox": "316,272,44,23",
              "Text": "WE"
            },
            {
              "BoundingBox": "375,271,111,24",
              "Text": "WOULD"
            }
          ]
        },
        {
          "BoundingBox": "317,300,200,24",
          "Words": [
            {
              "BoundingBox": "317,300,200,24",
              "Text": "LITERALLY"
            }
          ]
        },
        {
          "BoundingBox": "316,328,157,24",
          "Words": [
            {
              "BoundingBox": "316,328,157,24",
              "Text": "ASTOUND"
            }
          ]
        },
        {
          "BoundingBox": "318,357,214,30",
          "Words": [
            {
              "BoundingBox": "318,357,214,30",
              "Text": "QURSELV*S."
            }
          ]
        }
      ]
    }
  ]
}

By uploading data for this demo, you agree that Microsoft may store it and use it to improve Microsoft services, including this API. To help protect your privacy, we take steps to de-identify your data and keep it secure. We won’t publish your data or let other people use it.

Want to build this?

Documentation

Preview: Read handwritten text from images

This technology (handwritten OCR) allows you to detect and extract handwritten text from notes, letters, essays, whiteboards, forms, etc. It works with different surfaces and backgrounds, such as white paper, yellow sticky notes, and whiteboards.

Handwritten text recognition saves time and effort and can make you more productive by allowing you to take images of text, rather than having to transcribe it. It makes it possible to digitize notes, which then allows you to implement quick and easy search. It also reduces paper clutter.

Note: this technology is currently in preview and is only available for English text.

To try this optical character recognition demo, upload a locally stored image or provide an image URL. We don’t store the images you supply for this demo unless you give us permission.

See it in action

Browse

Preview
JSON

Our greatest glory is not

in never failing ,

but in rising every time we fall

{
  "Status": "Succeeded",
  "Succeeded": true,
  "Failed": false,
  "Finished": true,
  "RecognitionResult": {
    "Lines": [
      {
        "BoundingBox": [
          202,
          618,
          2047,
          643,
          2046,
          840,
          200,
          813
        ],
        "Text": "Our greatest glory is not",
        "Words": [
          {
            "BoundingBox": [
              204,
              627,
              481,
              628,
              481,
              830,
              204,
              829
            ],
            "Text": "Our"
          },
          {
            "BoundingBox": [
              519,
              628,
              1057,
              630,
              1057,
              832,
              518,
              830
            ],
            "Text": "greatest"
          },
          {
            "BoundingBox": [
              1114,
              630,
              1549,
              631,
              1548,
              833,
              1114,
              832
            ],
            "Text": "glory"
          },
          {
            "BoundingBox": [
              1586,
              631,
              1785,
              632,
              1784,
              834,
              1586,
              833
            ],
            "Text": "is"
          },
          {
            "BoundingBox": [
              1822,
              632,
              2115,
              633,
              2115,
              835,
              1822,
              834
            ],
            "Text": "not"
          }
        ]
      },
      {
        "BoundingBox": [
          420,
          1273,
          2954,
          1250,
          2958,
          1488,
          422,
          1511
        ],
        "Text": "but in rising every time we fall",
        "Words": [
          {
            "BoundingBox": [
              423,
              1269,
              634,
              1268,
              635,
              1507,
              424,
              1508
            ],
            "Text": "but"
          },
          {
            "BoundingBox": [
              667,
              1268,
              808,
              1268,
              809,
              1506,
              668,
              1507
            ],
            "Text": "in"
          },
          {
            "BoundingBox": [
              874,
              1267,
              1289,
              1265,
              1290,
              1504,
              875,
              1506
            ],
            "Text": "rising"
          },
          {
            "BoundingBox": [
              1331,
              1265,
              1771,
              1263,
              1772,
              1502,
              1332,
              1504
            ],
            "Text": "every"
          },
          {
            "BoundingBox": [
              1812,
              1263,
              2178,
              1261,
              2179,
              1500,
              1813,
              1502
            ],
            "Text": "time"
          },
          {
            "BoundingBox": [
              2219,
              1261,
              2510,
              1260,
              2511,
              1498,
              2220,
              1500
            ],
            "Text": "we"
          },
          {
            "BoundingBox": [
              2551,
              1260,
              3016,
              1258,
              3017,
              1496,
              2552,
              1498
            ],
            "Text": "fall"
          }
        ]
      },
      {
        "BoundingBox": [
          1612,
          903,
          2744,
          935,
          2738,
          1139,
          1607,
          1107
        ],
        "Text": "in never failing ,",
        "Words": [
          {
            "BoundingBox": [
              1611,
              934,
              1707,
              933,
              1708,
              1147,
              1613,
              1147
            ],
            "Text": "in"
          },
          {
            "BoundingBox": [
              1753,
              933,
              2132,
              930,
              2133,
              1144,
              1754,
              1146
            ],
            "Text": "never"
          },
          {
            "BoundingBox": [
              2162,
              930,
              2673,
              927,
              2674,
              1140,
              2164,
              1144
            ],
            "Text": "failing"
          },
          {
            "BoundingBox": [
              2703,
              926,
              2788,
              926,
              2790,
              1139,
              2705,
              1140
            ],
            "Text": ","
          }
        ]
      }
    ]
  }
}

Want to build this?

Documentation

Recognize celebrities and landmarks

The Celebrity and Landmark Models are examples of Domain Specific Models. Our celebrity recognition model recognizes 200K celebrities from business, politics, sports and entertainment. Our landmark recognition model recognizes 9000 natural and man-made landmarks from around the world. Domain Specific Models is a continuously evolving feature within Computer Vision API.

See it in action

Browse

{
  "categories": [
    {
      "name": "people_",
      "score": 0.86328125,
      "detail": {
        "celebrities": [
          {
            "name": "Satya Nadella",
            "faceRectangle": {
              "left": 239,
              "top": 293,
              "width": 138,
              "height": 138
            },
            "confidence": 0.9999974
          }
        ]
      }
    }
  ],
  "tags": [
    {
      "name": "person",
      "confidence": 0.99956613779067993
    },
    {
      "name": "suit",
      "confidence": 0.98934584856033325
    },
    {
      "name": "man",
      "confidence": 0.98844343423843384
    },
    {
      "name": "outdoor",
      "confidence": 0.860062301158905
    }
  ],
  "description": {
    "tags": [
      "person",
      "suit",
      "man",
      "necktie",
      "outdoor",
      "building",
      "clothing",
      "standing",
      "wearing",
      "business",
      "looking",
      "holding",
      "black",
      "front",
      "hand",
      "dressed",
      "phone",
      "field"
    ],
    "captions": [
      {
        "text": "Satya Nadella wearing a suit and tie",
        "confidence": 0.99033389849736619
      }
    ]
  },
  "requestId": "75d5c827-4f64-4f29-b155-d4ed1146a180",
  "metadata": {
    "width": 600,
    "height": 900,
    "format": "Jpeg"
  },
  "faces": [
    {
      "age": 49,
      "gender": "Male",
      "faceRectangle": {
        "left": 239,
        "top": 293,
        "width": 138,
        "height": 138
      }
    }
  ],
  "color": {
    "dominantColorForeground": "Black",
    "dominantColorBackground": "Black",
    "dominantColors": [
      "Black",
      "Grey"
    ],
    "accentColor": "7B5E50",
    "isBWImg": false
  },
  "imageType": {
    "clipArtType": 0,
    "lineDrawingType": 0
  }
}

Want to build this?

Documentation

Analyze video in near real-time

Analyze video in near real-time Use any of the Computer Vision APIs with you video files by extracting frames of the video from your device and then sending those frames to the API calls of your choice. Get results from your videos faster.

Use our sample on GitHub to get started and build your own app.

Learn more

See it in action

Want to build this?

Documentation

Generate a thumbnail

Generate a high quality storage-efficient thumbnail based on any input image. Use thumbnail generation to modify images to best suit your needs for size, shape, and style. Apply smart cropping to generate thumbnails that differ from the aspect ratio of your original image, yet preserve the region of interest.

See it in action

Want to build this?

Documentation

"We can use the Computer Vision API to prove to our clients the reliability of the data, so they can be confident making important business decisions based on that information"

Leendert de Voogd: CEO | Vigiglobe

Learn more

"It didn’t take us long to realize Microsoft Cognitive Services had handed us a powerful set of computer-vision and artificial-intelligence tools that we could use to create great apps and new features for our customers in just a few hours"

John Fan: Cofounder and CEO | Cardinal Blue Software

Learn more

"Because the Cognitive Services APIs harness the power of machine learning, we were able to bring advanced intelligence into our product without the need to have a team of data scientists on hand"

Aaron Edell: Chief Product Owner | GrayMeta

Learn more

"We found Cognitive Services to be the missing piece in the equation, the one that we needed to bring this solution to market and really revolutionize the way people look at video"

Katie McCann: Vice President of Product and Engineering | Prism Skylabs

Learn more

"Microsoft Cognitive Services gives us a huge range of opportunities. It’s a perfect match for us now, and in the future when we want to add more features to our app"

Jaan Apajalahti: CEO | Blucup

Learn more

"Using the Cognitive Services APIs, it took us three months to develop a test pair of glasses that can translate text and images into speech, identify emotions, and describe scenery. If we had been working full time, we could have done it in two weeks"

Benoit Chirouter: R&D Director | Pivothead

Learn more