On-Screen Search

Text Search (OCR)

Find and read text on screen using optical character recognition.

Overview

Text search uses Optical Character Recognition (OCR) to find and read text directly from the screen. This is useful when you need to interact with text that might change, or when you don't have reference images available.

Plugin Required

Text search requires an OCR plugin. Choose between @nut-tree/plugin-ocr (local, Tesseract-based) or @nut-tree/plugin-azure (cloud, Azure AI Vision).

Find Text

Locate text strings on screen

screen.find(singleWord("Submit"))

Read Text

Extract all text from screen or region

screen.read()

Wait for Text

Wait until text appears on screen

screen.waitFor(singleWord("Ready"))

Choosing an OCR Plugin

nut.js offers two OCR plugins with different trade-offs:

Local (Tesseract)

Runs offline, data stays on machine, good for standard text

@nut-tree/plugin-ocr

Azure (Cloud)

Better accuracy on complex/low-quality images, requires Azure account

@nut-tree/plugin-azure

When to Use Which?

Local OCR: Privacy-sensitive environments, offline use, standard UI text
Azure OCR: Complex images, handwriting, low-quality screenshots, higher accuracy needs

Installation & Setup

Local OCR (Tesseract)

typescript

npm install @nut-tree/plugin-ocr

Configure the plugin with a path to store language models and choose a model type:

typescript

import { screen, singleWord } from "@nut-tree/nut-js";
import { configure, LanguageModelType, Language, preloadLanguages } from "@nut-tree/plugin-ocr";

// Configure OCR
configure({
    dataPath: "./ocr-data",           // Where to store language models
    languageModelType: LanguageModelType.BEST  // BEST, DEFAULT, or FAST
});

// Preload languages to avoid delay on first search
await preloadLanguages([Language.English]);

Model Types

BEST — Higher accuracy, slower processing
DEFAULT — Balanced accuracy and speed
FAST — Faster processing, less accurate

Azure OCR (Cloud)

typescript

npm install @nut-tree/plugin-azure

Configure with your Azure AI Vision credentials:

typescript

import { screen, singleWord } from "@nut-tree/nut-js";
import { configure, Language } from "@nut-tree/plugin-azure/ocr";

// Configure with Azure credentials
configure({
    apiKey: process.env.VISION_KEY,      // Required
    apiEndpoint: process.env.VISION_ENDPOINT,  // Required
    // Optional settings:
    language: Language.English,          // Language enum or string code
    modelVersion: "latest",              // OCR model version
    readingOrder: "natural",             // "basic" or "natural" (Latin languages)
    checkResultInterval: 1000,           // Polling interval in ms
    checkResultRetryCount: 10            // Max polling attempts
});

Azure Prerequisites

You need an Azure account with an Azure AI Vision OCR resource. A free tier (F0) is available for testing. Get your API key and endpoint from the Azure portal under "Keys and Endpoint".

Quick Reference

find (with text)

screen.find(singleWord(text), options)

Promise<Region>

Find a word on screen and return its location

find (with text line)

screen.find(textLine(text), options)

Promise<Region>

Find a multi-word text line on screen

findAll (with text)

screen.findAll(singleWord(text), options)

Promise<Region[]>

Find all occurrences of text on screen

waitFor (with text)

screen.waitFor(singleWord(text), timeout, interval, options)

Promise<Region>

Wait until text appears on screen

read

screen.read(config?: ReadTextConfig)

Promise<string | OCRResult[]>

Extract all text from screen or a specific region

Finding Text

Use singleWord() for single words or textLine() for multi-word text. Pass language options via providerData.

typescript

import { screen, mouse, singleWord, centerOf, straightTo, Button } from "@nut-tree/nut-js";
import { Language } from "@nut-tree/plugin-ocr";

// Find a single word on screen
const submitButton = await screen.find(singleWord("Submit"), {
    providerData: {
        lang: [Language.English]
    }
});
console.log(`Found "Submit" at: ${submitButton.left}, ${submitButton.top}`);

// Click on the found text
await mouse.move(straightTo(centerOf(submitButton)));
await mouse.click(Button.LEFT);

Finding Multiple Occurrences

typescript

import { screen, singleWord } from "@nut-tree/nut-js";
import { Language } from "@nut-tree/plugin-ocr";

// Find all occurrences of a word
const allLinks = await screen.findAll(singleWord("Click"), {
    providerData: {
        lang: [Language.English]
    }
});
console.log(`Found ${allLinks.length} "Click" texts`);

Search Options

Both plugins support partialMatch and caseSensitive options. The main difference is how languages are configured:

typescript

// Local OCR - uses "lang" with Language enum array
import { Language } from "@nut-tree/plugin-ocr";

const result = await screen.find(singleWord("order"), {
    providerData: {
        lang: [Language.English, Language.German],  // Array for multi-language
        partialMatch: true,
        caseSensitive: false
    }
});

typescript

// Azure OCR - uses "language" with single Language enum or string
import { Language } from "@nut-tree/plugin-azure/ocr";

const result = await screen.find(singleWord("order"), {
    providerData: {
        language: Language.English,  // Single language only
        partialMatch: true,
        caseSensitive: false
    }
});

Multi-word Text

Use textLine() to search for phrases or multi-word text.

typescript

import { screen, textLine } from "@nut-tree/nut-js";
import { Language } from "@nut-tree/plugin-ocr";

// Find a multi-word phrase
const result = await screen.find(textLine("Save Changes"), {
    providerData: {
        lang: [Language.English]
    }
});
console.log(`Found phrase at: ${result.left}, ${result.top}`);

singleWord vs textLine

Use singleWord() when searching for individual words. Use textLine() when searching for phrases or text with spaces.

Waiting for Text

Wait for specific text to appear, useful for detecting loading completion or UI state changes.

typescript

import { screen, singleWord } from "@nut-tree/nut-js";
import { Language } from "@nut-tree/plugin-ocr";

// Wait for success message (timeout in ms, check interval in ms)
try {
    await screen.waitFor(singleWord("Complete"), 10000, 500, {
        providerData: {
            lang: [Language.English]
        }
    });
    console.log("Upload finished!");
} catch (error) {
    console.log("Upload did not complete in time");
}

// Wait for a button to become enabled
await screen.waitFor(singleWord("Continue"), 5000, 500, {
    providerData: {
        lang: [Language.English]
    }
});

Reading Text

Use screen.read() to extract all text from the screen or a specific region. This is useful for capturing content, verifying text, or processing screen output.

Basic Text Extraction

typescript

import { screen, getActiveWindow } from "@nut-tree/nut-js";
import { configure, Language, LanguageModelType, TextSplit, preloadLanguages } from "@nut-tree/plugin-ocr";

configure({
    dataPath: "./ocr-data",
    languageModelType: LanguageModelType.BEST
});

await preloadLanguages([Language.English]);

// Read all text from screen
const allText = await screen.read();
console.log(allText);

// Read text from a specific region (e.g., active window)
const activeWindow = await getActiveWindow();
const windowText = await screen.read({
    searchRegion: activeWindow.region
});

Text Split Options

Control the granularity of extracted text with the split option. The available split levels differ between plugins.

typescript

// Local OCR - TextSplit options
import { TextSplit } from "@nut-tree/plugin-ocr";

// Available split levels:
// TextSplit.SYMBOL    - Single character level
// TextSplit.WORD      - Word level
// TextSplit.LINE      - Line level
// TextSplit.PARAGRAPH - Paragraph level
// TextSplit.BLOCK     - Block level
// TextSplit.NONE      - Single string (default)

const lines = await screen.read({
    searchRegion: activeWindow.region,
    split: TextSplit.LINE
});

// Returns array of line results with confidence scores
for (const line of lines) {
    console.log(`Text: ${line.text}, Confidence: ${line.confidence}`);
}

typescript

// Azure OCR - TextSplit options
import { TextSplit } from "@nut-tree/plugin-azure/ocr";

// Available split levels (fewer than local):
// TextSplit.WORD - Word level
// TextSplit.LINE - Line level
// TextSplit.NONE - Single block (default)

const words = await screen.read({
    searchRegion: activeWindow.region,
    split: TextSplit.WORD
});

// Returns array of word results
for (const word of words) {
    console.log(`Word: ${word.text}, Confidence: ${word.confidence}`);
}

Split Level Differences

Local OCR: SYMBOL, WORD, LINE, PARAGRAPH, BLOCK, NONE
Azure OCR: WORD, LINE, NONE

ReadTextConfig Options

Configure text extraction with additional options:

typescript

// Local OCR ReadTextConfig
import { screen, Region } from "@nut-tree/nut-js";
import { Language, TextSplit } from "@nut-tree/plugin-ocr";

const result = await screen.read({
    searchRegion: new Region(0, 0, 800, 600),  // Limit to region
    languages: [Language.English, Language.German],  // OCR languages
    split: TextSplit.LINE  // Split granularity
});

typescript

// Azure OCR ReadTextConfig (can override global config)
import { screen, Region } from "@nut-tree/nut-js";
import { Language, TextSplit } from "@nut-tree/plugin-azure/ocr";

const result = await screen.read({
    searchRegion: new Region(0, 0, 800, 600),
    split: TextSplit.LINE,
    // Can override global Azure config per-call:
    language: Language.German,  // or "de"
    readingOrder: "natural"
});

Best Practices

OCR Tips

Use regions to limit the search area for better speed and accuracy
OCR works best with clear, high-contrast text
Avoid searching for very short strings (1-2 characters)
Wait for animations to complete before reading text

OCR Limitations

Stylized fonts or icons may not be recognized correctly
Very small text may be difficult to read
Low contrast text reduces accuracy
OCR is slower than image search—use images when possible

Text Search (OCR) Example

Complete guide to OCR automation with real-world scenarios