Google Vision API Quickstart

January 13th 2021

In order to use the Google Vision API which is part of the Google Cloud Platform, there are a few steps you need to take in order to get up and running and use the command line SDK.

Create a Google Platform Account

If you don’t already have an account you can signup by going here: https://console.cloud.google.com/home/ where you will be prompt to login or create an account.

Create a New Project

You will need to create a new project if you do not have a project setup already. The project will contain the specific credentials for authorization and the enabled APIs that you will use for the Vision API. You can name the project whatever you like.

Enable the Vision API

Once you created a project and are on the project dashboard you can use the hamburger menu in the top to select APIs & Services to navigate to the APIs dashboard or by going to: https://console.cloud.google.com/apis/dashboard

At the top of the dashboard click on ENABLE APIS & SERVICES and you will be directed to the APIs library. You can use the search to filter the Vision API to click on the Cloud Vision API and enable it.

Create an Identity Service Account

In order to use authentication service for the cloud API you will need to create a Identity service account which will generate a json file for you that the cloud sdk (which we’ll download in a big) will use to generate an api token. Using the hamburger menu at the top navigate to Identity → Service Accounts. On the Service Accounts dashboard click CREATE SERVICE ACCOUNT.

  1. In the Service account name field, enter a name.
  2. From the Role list, select Project > Owner.
  3. Click Create. A JSON file that contains your key downloads to your computer. We will save this in our ~/.ssh folder as we likely have other credentials stored and created there.
  4. We will need to set an environment variable GOOGLE_APPLICATION_CREDENTIALS to the path where our JSON file was downloaded. Open up .bashrc or .zschrc in your home directory and paste:
# Export Google Service Application Credentials #
export GOOGLE_APPLICATION_CREDENTIALS="$HOME/.ssh/Project-3c4agk421ub7.json"

Make sure you replace Project-3c4agk421ub7.json with the name of the JSON file you downloaded.

Install the Google Cloud SDK

Navigate to https://cloud.google.com/sdk/docs/install and choose the OS and environment you are using to download the script to install the sdk. You can follow the instructions for your OS but if you are like me using macOS you will download the tar.gz file to your computer. For organization I created a folder called ~/gcloud where I downloaded the install script.

  1. Extract the archive to any location on your file system; preferably, your home directory. On macOS, this can be achieved by opening the downloaded `.tar.gz` archive file in the preferred location.If you would like to replace an existing installation, remove the existing google-cloud-sdk directory and extract the archive to the same location.
  2. Optional. Use the install script to add Cloud SDK tools to your path. 
./google-cloud-sdk/install.sh

You can now run the installer by executing the init script:

./google-cloud-sdk/bin/gcloud init

You will be prompt to login to the google account that you used to sign up for the cloud platform under. Once logged in you should be able to execute:

gcloud auth application-default print-access-token

Setting up the Request Data for Vision API

We’re going to use curl to make a POST call to the vision API and pass some data in the post body and use the Vision Web Detection request to get a list of websites that have the matching or closely matching image. We can create a request.json file:

{
  "requests": [
    {
      "image": {
        "content": "base64-encoded-image"
      },
      "features": [
        {
          "maxResults": 10,
          "type": "WEB_DETECTION"
        },
      ]
    }
  ]
}

In the “content” field we will replace the value with a base64 encoded image value. On mac you can create a base64 encoded value by executing:

base64 -i input.jpg -o output.txt

Copy the value of output.txt into the “content” field of the request.json

Post to the Vision API

We’re now ready to execute the POST request to the Vision API. On the command line, in the folder with your request.json data you can execute:

curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
https://vision.googleapis.com/v1/images:annotate

With any luck you should get a response such as:

{
  "responses": [
    {
      "webDetection": {
        "webEntities": [
          {
            "entityId": "/m/02p7_j8",
            "score": 1.44225,
            "description": "Carnival in Rio de Janeiro"
          },
          {
            "entityId": "/m/06gmr",
            "score": 1.2913725,
            "description": "Rio de Janeiro"
          },
          {
            "entityId": "/m/04cx88",
            "score": 0.78465,
            "description": "Brazilian Carnival"
          },
          {
            "entityId": "/m/09l9f",
            "score": 0.7166,
            "description": "Carnival"
          },
          ...
        ],
        "fullMatchingImages": [
          {
            "url": "https://1000lugaresparair.files.wordpress.com/2017/11/quinten-de-graaf-278848.jpg"
          },
          ...
        ],
        "partialMatchingImages": [
          {
            "url": "https://www.linnanneito.fi/wp-content/uploads/sambakarnevaali-riossa.jpg"
          },
          ...
        ],
        "pagesWithMatchingImages": [
          {
            "url": "https://www.intrepidtravel.com/us/brazil/rio-carnival-122873",
            "pageTitle": "\u003cb\u003eRio Carnival\u003c/b\u003e | Intrepid Travel US",
            "partialMatchingImages": [
              {
                "url": "https://www.intrepidtravel.com/sites/intrepid/files/styles/large/public/elements/product/hero/GGSR-Brazil-rio-carnival-ladies.jpg"
              },
              ...
        ],
        "visuallySimilarImages": [
          {
            "url": "https://pbs.twimg.com/media/DVoQOx6WkAIpHKF.jpg"
          },
          ...
        ],
        "bestGuessLabels": [
          {
            "label": "rio carnival",
            "languageCode": "en"
          }
        ]
      }
    }
  ]
}

Hooray! We’ve setup and configured the glcoud sdk to use the Google Vision API to get information about our images. You can read about all of the possibilities of the Vision API here: https://cloud.google.com/vision/docs/features-list

Enjoy!