OCR

plugin-azure

Cloud-based OCR using Azure AI Vision for high-accuracy text recognition.

Overview

The @nut-tree/plugin-azure plugin integrates Azure AI Vision services for on-screen text search and extraction. It excels at recognizing text in complex layouts and low-quality images compared to local OCR.

Text Search

Find text on screen using Azure AI Vision

screen.find(singleWord("Login"))

Text Extraction

Read text from screen regions with high accuracy

screen.read({ searchRegion })

Installation

typescript
npm i @nut-tree/plugin-azure

Subscription Required

This package is included in OCR, Solo, and Team subscription plans.

Prerequisites

You need an Azure account with an AI Vision resource deployed:

  1. Create an Azure account (free tier available)
  2. Deploy an Azure AI Vision OCR resource
  3. Navigate to "Keys and Endpoint" in your resource's left navigation menu
  4. Copy the API key and endpoint URL

Configuration

typescript
import { useAzureVisionOCR, configure } from "@nut-tree/plugin-azure/ocr";

useAzureVisionOCR();

configure({
    apiKey: process.env.VISION_KEY,
    apiEndpoint: process.env.VISION_ENDPOINT,
});

Configuration Options

apiEndpoint

apiEndpoint: string
required

Your Azure AI Vision resource URL

apiKey

apiKey: string
required

Your Azure AI Vision API key

checkResultInterval

checkResultInterval?: number
optional

Polling interval in milliseconds when waiting for OCR results

checkResultRetryCount

checkResultRetryCount?: number
optional

Maximum number of polling attempts for OCR results

language

language?: string
optional

Enforce a single language for OCR

modelVersion

modelVersion?: string
optional

Specific Azure model version to use

readingOrder

readingOrder?: AzureOcrServiceReadingOrder
default: BASIC

Reading order: BASIC (left-to-right) or NATURAL (for Latin languages)

typescript
import { centerOf, mouse, screen, singleWord, straightTo } from "@nut-tree/nut-js";
import { useAzureVisionOCR, configure } from "@nut-tree/plugin-azure/ocr";

useAzureVisionOCR();
configure({
    apiKey: process.env.VISION_KEY,
    apiEndpoint: process.env.VISION_ENDPOINT,
});

const location = await screen.find(singleWord("WebStorm"));
await mouse.move(straightTo(centerOf(location)));

You can override configuration on a per-call basis:

typescript
const location = await screen.find(singleWord("Submit"), {
    providerData: {
        caseSensitive: false,
        partialMatch: true,
        language: "en",
    },
});

Text Extraction

typescript
import { getActiveWindow, screen } from "@nut-tree/nut-js";
import { useAzureVisionOCR, configure, TextSplit } from "@nut-tree/plugin-azure/ocr";

useAzureVisionOCR();
configure({
    apiKey: process.env.VISION_KEY,
    apiEndpoint: process.env.VISION_ENDPOINT,
});

const activeWindow = await getActiveWindow();
const text = await screen.read({
    searchRegion: activeWindow.region,
    split: TextSplit.WORD,
});

TextSplit Options

NONE

TextSplit.NONE
default

Return as a single text block

LINE

TextSplit.LINE

Split by lines

WORD

TextSplit.WORD

Split by words


plugin-azure vs plugin-ocr

Aspectplugin-azureplugin-ocr
ProcessingCloud (Azure AI Vision)Local (Tesseract)
Data privacyData sent to AzureNo external transmission
AccuracyHigher on complex/low-quality imagesStandard
RequirementsAzure account + API keyNone (standalone)

Was this page helpful?