Tutorials
Text Search
nut.js allows you to locate template images on your screen, but in some cases locating a certain text might be more useful and flexible.
Remark: Text search uses the exact same set of screen
methods as image search, only with different query types. For a general understanding of different screen
methods, please also take a look at the image search tutorial.
Another remark: Both @nut-tree/plugin-ocr
and @nut-tree/plugin-azure
are very similar in terms of usage, they only differ in their configuration.
TextFinder Providers
To do so, we will have to install an additional package, providing the actual implementation to perform text search. Otherwise, all functions relying on text search will throw an error like Error: No TextFinder registered
.
Currently, nut.js provides two types of TextFinder
implementations:
Attention: These are nut.js premium packages which require an active subscription. See the registry access tutorial to learn how to subscribe and access the private registry.
Text queries
Both plugins process text queries to search for text on screen. Currently, nut.js provides two different text queries:
singleWord
: Searches for a single word.textLine
: Searches for a text line, so it's possible to search for multiple, concatenated words. E.g.textLine("How to use this plugin")
would search for this very sentence.
@nut-tree/plugin-ocr
npm i @nut-tree/plugin-ocr
In its simplest form, we only need to install the package and require it in your code to use it:
const {screen, singleWord} = require("@nut-tree/nut-js");
require("@nut-tree/plugin-ocr");
(async () => {
try {
const location = await screen.find(singleWord("nut.js"));
} catch (e) {
console.error(e);
}
})();
This is all we need to perform offline text-search using defaults provided by the module.
But it wouldn't be a tutorial if we'd stop here!
Configure the language model type
The @nut-tree/plugin-ocr
provider package provides some config options, so let's require it in our code, e.g. index.js
, to configure it:
const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure} = require("@nut-tree/plugin-ocr");
(async () => {
try {
const location = await screen.find(singleWord("nut.js"));
} catch (e) {
console.error(e);
}
})();
With configure
we're able to customize two settings:
languageModelType
: The language model type to use for OCR. Possible values areDEFAULT
,BEST
andFAST
. Visit the Configuration section of the OCR plugin documentation for more information.dataPath
: The path where we store OCR models. This may be useful if you want to store the models in a specific location.
const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType} = require("@nut-tree/plugin-ocr");
configure({
languageModelType: LanguageModelType.BEST
});
(async () => {
try {
const location = await screen.find(singleWord("nut.js"));
} catch (e) {
console.error(e);
}
})();
In this example we will use LanguageModelType.BEST
which is slower compared to other models, but yields the most accurate results.
Specify OCR languages
The default configuration of the OCR plugin uses the English language. But if we want to use a different language, or multiple languages at once, we can specify them in the providerData
object of the find
function.
Remark: Other screen
methods like findAll
, waitFor
or read
also accept the providerData
object.
const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType, Language} = require("@nut-tree/plugin-ocr");
configure({
languageModelType: LanguageModelType.BEST
});
(async () => {
try {
const location = await screen.find(singleWord("nut.js"), {
providerData: {
lang: [Language.English, Language.German],
}
});
} catch (e) {
console.error(e);
}
})();
This way we can specify the languages we want to use for OCR. The Language
enum is provided by the OCR plugin. You can find a list of all available languages in the Configuration section of the OCR plugin documentation.
Preload OCR languages
Multi-language support works by downloading required models on the fly. When using a new combination of LanguageModelType
/Language
the first time, the plugin will automatically download and cache the model locally.
What if we want to avoid occasional loading times during execution? It's possible to preload models you know you'll use. This way it's possible to load required models at a defined point in time to make sure they're available when needed. If a model is already cached locally it won't be re-downloaded.
So let's make sure we have both English and German at our disposal:
const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType, Language, preloadLanguages} = require("@nut-tree/plugin-ocr");
configure({
languageModelType: LanguageModelType.BEST
});
(async () => {
try {
await preloadLanguages([Language.English, Language.German]);
const location = await screen.find(singleWord("nut.js"), {
providerData: {
lang: [Language.English, Language.German],
}
});
} catch (e) {
console.error(e);
}
})();
Dealing with flawed results
OCR engines are not perfect and sometimes return a bit messed up results. Emojis are interpreted as characters, sometimes a space is lost and two words are joined, you name it. In order to deal with such inconsistencies, it's also possible to adjust two parameters via providerData
:
partialMatch
: Even if a single word returned by the OCR engine contains the following period or any similar case, when settingpartialMatch
to true you'll still get a hit, even if it's only a partial match.caseSensitive
: Toggle case sensitivity when looking for matches. This is another way to deal with eventual inconsistencies in OCR results.
For our example, let's allow partial matches and disable case-sensitivity:
const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType, Language, preloadLanguages} = require("@nut-tree/plugin-ocr");
configure({
languageModelType: LanguageModelType.BEST
});
(async () => {
try {
await preloadLanguages([Language.English, Language.German]);
const location = await screen.find(singleWord("nut.js"), {
providerData: {
lang: [Language.English, Language.German],
partialMatch: true,
caseSensitive: false,
}
});
} catch (e) {
console.error(e);
}
})();
Custom OCR confidence value
One way to configure the minimum required confidence value for a match when performing on-screen search is the screen.config.confidence
value. This property was introduced with the initial image search plugin, thus it was exclusively used for image search.
Now that there are additional things to search for on-screen, like text, this single confidence value becomes a bit limiting. In cases where we are using both image and text search we'd like to have a separate way to configure confidence values used for OCR based searches.
After importing @nut-tree/plugin-ocr
there's another property at our disposal to configure the confidence value required for text search:
screen.config.ocrConfidence
This value specifies the percentage required for a text search result to be accepted.
const {screen, singleWord} = require("@nut-tree/nut-js");
const {configure, LanguageModelType, Language, preloadLanguages} = require("@nut-tree/plugin-ocr");
configure({
languageModelType: LanguageModelType.BEST
});
screen.config.ocrConfidence = 0.9;
(async () => {
try {
await preloadLanguages([Language.English, Language.German]);
const location = await screen.find(singleWord("nut.js"), {
providerData: {
lang: [Language.English, Language.German],
partialMatch: true,
caseSensitive: true,
}
});
} catch (e) {
console.error(e);
}
})();
Full example
Let's take a look at a full example which brings all previously discussed pieces together. The following sample would demonstrate a hypothetical scenario where we are trying to click a button which is labelled "Bestätigen" in German (that would be "Confirm" in English).
We configure our languageModelType
to the model which delivers the most accurate results, preload German language data, configure a custom OCR confidence value of 80% and run a non case-sensitive search for a singleWord
, allowing for partial matches.
const {getActiveWindow, mouse, screen, singleWord, straightTo} = require("@nut-tree/nut-js");
const {configure, Language, LanguageModelType, preloadLanguages} = require("@nut-tree/plugin-ocr");
configure({
languageModelType: LanguageModelType.BEST
});
(async () => {
await preloadLanguages([Language.German]);
screen.config.ocrConfidence = 0.8;
screen.config.autoHighlight = true;
const location = await screen.find(singleWord("Bestätigen"), {
providerData: {
lang: [Language.German],
partialMatch: true,
caseSensitive: false
}
});
await mouse.move(
straightTo(
centerOf(
location
)
)
);
await mouse.leftClick();
})();