On-Screen Search
Text Search (OCR)
Find and read text on screen using optical character recognition.
Overview
Text search uses Optical Character Recognition (OCR) to find and read text directly from the screen. This is useful when you need to interact with text that might change, or when you don't have reference images available.
Plugin Required
@nut-tree/plugin-ocr (local, Tesseract-based) or @nut-tree/plugin-azure (cloud, Azure AI Vision).Find Text
Locate text strings on screen
screen.find(singleWord("Submit"))Read Text
Extract all text from screen or region
screen.read()Wait for Text
Wait until text appears on screen
screen.waitFor(singleWord("Ready"))Choosing an OCR Plugin
nut.js offers two OCR plugins with different trade-offs:
Local (Tesseract)
Runs offline, data stays on machine, good for standard text
@nut-tree/plugin-ocrAzure (Cloud)
Better accuracy on complex/low-quality images, requires Azure account
@nut-tree/plugin-azureWhen to Use Which?
- Local OCR: Privacy-sensitive environments, offline use, standard UI text
- Azure OCR: Complex images, handwriting, low-quality screenshots, higher accuracy needs
Installation & Setup
Local OCR (Tesseract)
npm install @nut-tree/plugin-ocrConfigure the plugin with a path to store language models and choose a model type:
import { screen, singleWord } from "@nut-tree/nut-js";
import { configure, LanguageModelType, Language, preloadLanguages } from "@nut-tree/plugin-ocr";
// Configure OCR
configure({
dataPath: "./ocr-data", // Where to store language models
languageModelType: LanguageModelType.BEST // BEST, DEFAULT, or FAST
});
// Preload languages to avoid delay on first search
await preloadLanguages([Language.English]);Model Types
BEST— Higher accuracy, slower processingDEFAULT— Balanced accuracy and speedFAST— Faster processing, less accurate
Azure OCR (Cloud)
npm install @nut-tree/plugin-azureConfigure with your Azure AI Vision credentials:
import { screen, singleWord } from "@nut-tree/nut-js";
import { configure, Language } from "@nut-tree/plugin-azure/ocr";
// Configure with Azure credentials
configure({
apiKey: process.env.VISION_KEY, // Required
apiEndpoint: process.env.VISION_ENDPOINT, // Required
// Optional settings:
language: Language.English, // Language enum or string code
modelVersion: "latest", // OCR model version
readingOrder: "natural", // "basic" or "natural" (Latin languages)
checkResultInterval: 1000, // Polling interval in ms
checkResultRetryCount: 10 // Max polling attempts
});Azure Prerequisites
Quick Reference
find (with text)
screen.find(singleWord(text), options)Find a word on screen and return its location
find (with text line)
screen.find(textLine(text), options)Find a multi-word text line on screen
findAll (with text)
screen.findAll(singleWord(text), options)Find all occurrences of text on screen
waitFor (with text)
screen.waitFor(singleWord(text), timeout, interval, options)Wait until text appears on screen
read
screen.read(config?: ReadTextConfig)Extract all text from screen or a specific region
Finding Text
Use singleWord() for single words or textLine() for multi-word text. Pass language options via providerData.
import { screen, mouse, singleWord, centerOf, straightTo, Button } from "@nut-tree/nut-js";
import { Language } from "@nut-tree/plugin-ocr";
// Find a single word on screen
const submitButton = await screen.find(singleWord("Submit"), {
providerData: {
lang: [Language.English]
}
});
console.log(`Found "Submit" at: ${submitButton.left}, ${submitButton.top}`);
// Click on the found text
await mouse.move(straightTo(centerOf(submitButton)));
await mouse.click(Button.LEFT);Finding Multiple Occurrences
import { screen, singleWord } from "@nut-tree/nut-js";
import { Language } from "@nut-tree/plugin-ocr";
// Find all occurrences of a word
const allLinks = await screen.findAll(singleWord("Click"), {
providerData: {
lang: [Language.English]
}
});
console.log(`Found ${allLinks.length} "Click" texts`);Search Options
Both plugins support partialMatch and caseSensitive options. The main difference is how languages are configured:
// Local OCR - uses "lang" with Language enum array
import { Language } from "@nut-tree/plugin-ocr";
const result = await screen.find(singleWord("order"), {
providerData: {
lang: [Language.English, Language.German], // Array for multi-language
partialMatch: true,
caseSensitive: false
}
});// Azure OCR - uses "language" with single Language enum or string
import { Language } from "@nut-tree/plugin-azure/ocr";
const result = await screen.find(singleWord("order"), {
providerData: {
language: Language.English, // Single language only
partialMatch: true,
caseSensitive: false
}
});Multi-word Text
Use textLine() to search for phrases or multi-word text.
import { screen, textLine } from "@nut-tree/nut-js";
import { Language } from "@nut-tree/plugin-ocr";
// Find a multi-word phrase
const result = await screen.find(textLine("Save Changes"), {
providerData: {
lang: [Language.English]
}
});
console.log(`Found phrase at: ${result.left}, ${result.top}`);singleWord vs textLine
singleWord() when searching for individual words. Use textLine() when searching for phrases or text with spaces.Waiting for Text
Wait for specific text to appear, useful for detecting loading completion or UI state changes.
import { screen, singleWord } from "@nut-tree/nut-js";
import { Language } from "@nut-tree/plugin-ocr";
// Wait for success message (timeout in ms, check interval in ms)
try {
await screen.waitFor(singleWord("Complete"), 10000, 500, {
providerData: {
lang: [Language.English]
}
});
console.log("Upload finished!");
} catch (error) {
console.log("Upload did not complete in time");
}
// Wait for a button to become enabled
await screen.waitFor(singleWord("Continue"), 5000, 500, {
providerData: {
lang: [Language.English]
}
});Reading Text
Use screen.read() to extract all text from the screen or a specific region. This is useful for capturing content, verifying text, or processing screen output.
Basic Text Extraction
import { screen, getActiveWindow } from "@nut-tree/nut-js";
import { configure, Language, LanguageModelType, TextSplit, preloadLanguages } from "@nut-tree/plugin-ocr";
configure({
dataPath: "./ocr-data",
languageModelType: LanguageModelType.BEST
});
await preloadLanguages([Language.English]);
// Read all text from screen
const allText = await screen.read();
console.log(allText);
// Read text from a specific region (e.g., active window)
const activeWindow = await getActiveWindow();
const windowText = await screen.read({
searchRegion: activeWindow.region
});Text Split Options
Control the granularity of extracted text with the split option. The available split levels differ between plugins.
// Local OCR - TextSplit options
import { TextSplit } from "@nut-tree/plugin-ocr";
// Available split levels:
// TextSplit.SYMBOL - Single character level
// TextSplit.WORD - Word level
// TextSplit.LINE - Line level
// TextSplit.PARAGRAPH - Paragraph level
// TextSplit.BLOCK - Block level
// TextSplit.NONE - Single string (default)
const lines = await screen.read({
searchRegion: activeWindow.region,
split: TextSplit.LINE
});
// Returns array of line results with confidence scores
for (const line of lines) {
console.log(`Text: ${line.text}, Confidence: ${line.confidence}`);
}// Azure OCR - TextSplit options
import { TextSplit } from "@nut-tree/plugin-azure/ocr";
// Available split levels (fewer than local):
// TextSplit.WORD - Word level
// TextSplit.LINE - Line level
// TextSplit.NONE - Single block (default)
const words = await screen.read({
searchRegion: activeWindow.region,
split: TextSplit.WORD
});
// Returns array of word results
for (const word of words) {
console.log(`Word: ${word.text}, Confidence: ${word.confidence}`);
}Split Level Differences
- Local OCR: SYMBOL, WORD, LINE, PARAGRAPH, BLOCK, NONE
- Azure OCR: WORD, LINE, NONE
ReadTextConfig Options
Configure text extraction with additional options:
// Local OCR ReadTextConfig
import { screen, Region } from "@nut-tree/nut-js";
import { Language, TextSplit } from "@nut-tree/plugin-ocr";
const result = await screen.read({
searchRegion: new Region(0, 0, 800, 600), // Limit to region
languages: [Language.English, Language.German], // OCR languages
split: TextSplit.LINE // Split granularity
});// Azure OCR ReadTextConfig (can override global config)
import { screen, Region } from "@nut-tree/nut-js";
import { Language, TextSplit } from "@nut-tree/plugin-azure/ocr";
const result = await screen.read({
searchRegion: new Region(0, 0, 800, 600),
split: TextSplit.LINE,
// Can override global Azure config per-call:
language: Language.German, // or "de"
readingOrder: "natural"
});Best Practices
OCR Tips
- Use regions to limit the search area for better speed and accuracy
- OCR works best with clear, high-contrast text
- Avoid searching for very short strings (1-2 characters)
- Wait for animations to complete before reading text
OCR Limitations
- Stylized fonts or icons may not be recognized correctly
- Very small text may be difficult to read
- Low contrast text reduces accuracy
- OCR is slower than image search—use images when possible