OCR
plugin-azure
Cloud-based OCR using Azure AI Vision for high-accuracy text recognition.
Overview
The @nut-tree/plugin-azure plugin integrates Azure AI Vision services for on-screen text search and extraction. It excels at recognizing text in complex layouts and low-quality images compared to local OCR.
Text Search
Find text on screen using Azure AI Vision
screen.find(singleWord("Login"))Text Extraction
Read text from screen regions with high accuracy
screen.read({ searchRegion })Installation
npm i @nut-tree/plugin-azureSubscription Required
Prerequisites
You need an Azure account with an AI Vision resource deployed:
- Create an Azure account (free tier available)
- Deploy an Azure AI Vision OCR resource
- Navigate to "Keys and Endpoint" in your resource's left navigation menu
- Copy the API key and endpoint URL
Configuration
import { useAzureVisionOCR, configure } from "@nut-tree/plugin-azure/ocr";
useAzureVisionOCR();
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
});Configuration Options
apiEndpoint
apiEndpoint: stringYour Azure AI Vision resource URL
apiKey
apiKey: stringYour Azure AI Vision API key
checkResultInterval
checkResultInterval?: numberPolling interval in milliseconds when waiting for OCR results
checkResultRetryCount
checkResultRetryCount?: numberMaximum number of polling attempts for OCR results
language
language?: stringEnforce a single language for OCR
modelVersion
modelVersion?: stringSpecific Azure model version to use
readingOrder
readingOrder?: AzureOcrServiceReadingOrderReading order: BASIC (left-to-right) or NATURAL (for Latin languages)
Text Search
import { centerOf, mouse, screen, singleWord, straightTo } from "@nut-tree/nut-js";
import { useAzureVisionOCR, configure } from "@nut-tree/plugin-azure/ocr";
useAzureVisionOCR();
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
});
const location = await screen.find(singleWord("WebStorm"));
await mouse.move(straightTo(centerOf(location)));You can override configuration on a per-call basis:
const location = await screen.find(singleWord("Submit"), {
providerData: {
caseSensitive: false,
partialMatch: true,
language: "en",
},
});Text Extraction
import { getActiveWindow, screen } from "@nut-tree/nut-js";
import { useAzureVisionOCR, configure, TextSplit } from "@nut-tree/plugin-azure/ocr";
useAzureVisionOCR();
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
});
const activeWindow = await getActiveWindow();
const text = await screen.read({
searchRegion: activeWindow.region,
split: TextSplit.WORD,
});TextSplit Options
NONE
TextSplit.NONEReturn as a single text block
LINE
TextSplit.LINESplit by lines
WORD
TextSplit.WORDSplit by words
plugin-azure vs plugin-ocr
| Aspect | plugin-azure | plugin-ocr |
|---|---|---|
| Processing | Cloud (Azure AI Vision) | Local (Tesseract) |
| Data privacy | Data sent to Azure | No external transmission |
| Accuracy | Higher on complex/low-quality images | Standard |
| Requirements | Azure account + API key | None (standalone) |