Menu

OCR and translation with Azure AI Services

Introduction

Modern smartphones can capture text from an image and translate it to another language with just a few clicks. What if you could add similar but automated character recognition and translation capability for your healthcare or banking system to speed up digitalization? I'm going to show you just how easy it is to implement OCR(Optical Character Recognition) and translation capabilities with Azure AI Services.

Azure AI Services

Azure AI Services is a set of tools and services for creating intelligent applications. I'm going to utilize only a few of it's capabilities, Computer Vision and Translator Service.

Azure AI Services
Azure AI Services

Azure Computer Vision is a set of services using algorithms for processing images and returning image data in text format. Computer Vision has separate services for face, image and spatial analysis and OCR. We're using the OCR service and it's Read API which can be also installed in a container which enables running the APIs in your own environment.

Translator service, as the name suggests, is a cloud-based machine translation service.

Both services have free tiers available in Azure with limitations. Translator service can translate up to 2M characters per month and Computer Vision includes 5K calls in a month with 20 calls per minute.

Before utilizing the services, we need the auto-generated keys from Azure and the service name for Computer Vision SDK.

Service keys
Service keys

Implementation

I'm creating a simple console application in .NET which utilizes Computer Vision SDK Microsoft.Azure.CognitiveServices.Vision.ComputerVision for communicating with Computer Vision service. Translations are managed through REST API calls. Translator service integration could be improved with a SDK, though.

Image used in this example is something you would've stumbled upon back in the day when paper manuals were a thing.

Picture of a manual with safety instructions in Italian
Picture of a manual with safety instructions in Italian

Image can be assigned to the Computer Vision service client either as a stream or from a url. The example uses an image from a local hard-drive converted to a stream.


  ComputerVisionClient client = new ComputerVisionClient(
      new ApiKeyServiceClientCredentials("<your_computer_vision_service_key>"))
      { 
          Endpoint = $"https://<your_computer_vision_service_name>.cognitiveservices.azure.com" 
      };

  byte[] img = File.ReadAllBytes(@"C:\Temp\Pictures\manual_pic.jpg");
  var operation = await client.ReadInStreamAsync(new MemoryStream(img));

  ReadOperationResult result;
  do
  {
      result = await client.GetReadResultAsync(Guid.Parse(operation.OperationLocation.Split("/").Last()));
  }
  while ((result.Status == OperationStatusCodes.Running || result.Status == OperationStatusCodes.NotStarted));

  StringBuilder sb = new StringBuilder();
  foreach (var readResult in result.AnalyzeResult.ReadResults)
  {
      sb.AppendJoin(Environment.NewLine, readResult.Lines.Select(e => e.Text));
  }

Notice that the SDK requires an OperationId parameter which can be extracted from the end of the operation location url in guid format.

Now we can connect to the Translator service using REST API. I'm using the latest version v3.0 of the API.


  string key = "<your_translator_service_key>";
  string endpoint = "https://api.cognitive.microsofttranslator.com";
  string route = "/translate?api-version=3.0&from=it&to=en";
  
  object[] body = new object[] { new { Text = sb.ToString() } };
  var requestBody = JsonConvert.SerializeObject(body);
  
  using (var httpClient = new HttpClient())
  using (var request = new HttpRequestMessage())
  {
      // Build the request.
      request.Method = HttpMethod.Post;
      request.RequestUri = new Uri(endpoint + route);    
      request.Headers.Add("Ocp-Apim-Subscription-Key", key);
      request.Headers.Add("Ocp-Apim-Subscription-Region", "westeurope");
  
      request.Content = new StringContent(requestBody, Encoding.UTF8, "application/json");
  
      HttpResponseMessage response = await httpClient.SendAsync(request).ConfigureAwait(false);

      string translationResult = await response.Content.ReadAsStringAsync();
      Console.WriteLine(translationResult);
  }

Notice the translation from and to parameters passed in the query string along with the API version.

Translated output.
Translated output

The translated output is in json format so it's easy to start integrating it to your solution.

Summary

Although this was just a sneak peek to the Azure AI Services offering, the utilized capabilities can already boost your digital transformation project even with such a simple solution.

Luckily, both of the services have free tiers so you can play with them to a certain degree. I experienced random latency with the Computer Vision SDK which might've been caused by the free tier.