Tutorials
Text Search
nut.js allows you to locate template images on your screen, but in some cases locating a certain text might be more useful and flexible.
Remark: Text search uses the exact same set of screen
methods as image search, only with different query types. For a general understanding of different screen
methods, please also take a look at the image search tutorial.
Another remark: Both @nut-tree/plugin-ocr
and @nut-tree/plugin-azure
are very similar in terms of usage, they only differ in their configuration.
TextFinder Providers
To do so, we will have to install an additional package, providing the actual implementation to perform text search. Otherwise, all functions relying on text search will throw an error like Error: No TextFinder registered
.
Currently, nut.js provides two types of TextFinder
implementations:
Attention: These are nut.js premium packages which require an active subscription. See the registry access tutorial to learn how to subscribe and access the private registry.
Text queries
Both plugins process text queries to search for text on screen. Currently, nut.js provides two different text queries:
singleWord
: Searches for a single word.textLine
: Searches for a text line, so it's possible to search for multiple, concatenated words. E.g.textLine("How to use this plugin")
would search for this very sentence.
Prerequisites
In order to use @nut-tree/plugin-azure
, you need to have an Azure account and an Azure AI Vision OCR resource. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
Once you have both things set up, you'll need a key and the endpoint of the resource you created to connect your application to the Azure AI Vision service:
- After your Azure Vision resource deployed, select
Go to resource
. - In the left navigation menu, select
Keys and Endpoint
. - Copy one of the keys and the endpoint.
- Use them in your code via e.g. environment variables.
@nut-tree/plugin-azure
npm i @nut-tree/plugin-azure
Configure credentials
Assuming you went through the Prerequisites step, let's load our credentials.
First, we'll create a .env
file where we store our credentials obtained from Azure:
VISION_KEY=<YOUR_API_KEY>
VISION_ENDPOINT=<YOUR_API_ENDPOINT>
Next, we will install dotenv
, one of the most widely used packages to work with .env
files.
npm i dotenv
Now let's use it in our script to populate our environment from our .env
file:
const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
The @nut-tree/plugin-azure
package provides a subpackage for OCR: @nut-tree/plugin-azure/ocr
.
To get things set up, we need to import two things:
const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");
Since we're using dotenv
we can now simple reference our credentials via process.env
. Following the configuration
part we'll have to call useAzureVisionOCR()
to register the plugin and we're good to go:
const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
});
useAzureVisionOCR();
(async () => {
try {
const location = await screen.find(singleWord("nut.js"));
} catch (e) {
console.error(e);
}
})();
And that's basically it!
We've provided the minimum required configuration to use the Azure Vision OCR service. Since the Azure Vision OCR service will detect all languages present in an image automatically, we don't have to provide them explicitly, nor do we have to fetch any language data locally as we do with @nut-tree/plugin-ocr.
On the other hand, this plugin does not work offline, so it's up to you to decide which package to use.
Remark: Tests have shown that @nut-tree/plugin-azure
yields more accurate results than @nut-tree/plugin-ocr
with little to no additional configuration.
Specify OCR language
As we learned earlier, the Azure Vision OCR service will automatically extract text from images, even if it's mixed languages. However, it is still possible to specify one particular language you want to force usage on an OCR run. This language can be configured via the providerData
object of the find
function.
Remark: Other screen
methods like findAll
, waitFor
or read
also accept the providerData
const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
});
useAzureVisionOCR();
(async () => {
try {
const location = await screen.find(singleWord("Bestätigen"), {
providerData: {
language: Language.German,
}
});
} catch (e) {
console.error(e);
}
})();
This way you can specify the languages you want to use for OCR. The Language
enum is provided by the plugin. You can find a list of all available languages in the Configuration section of the OCR plugin documentation.
Alternatively, if you don't want to specify the language on every call to find
, the configuration can be move to the global plugin configuration:
const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
language: Language.German,
});
useAzureVisionOCR();
(async () => {
try {
const location = await screen.find(singleWord("Bestätigen"));
} catch (e) {
console.error(e);
}
})();
Dealing with flawed results
OCR engines are not perfect and sometimes returned a bit messed up results. Emojis are interpreted as characters, sometimes a space is lost and two words are joined, you name it. In order to deal with such inconsistencies, it's also possible to adjust two parameters via providerData
:
partialMatch
: Even if a single word returned by the OCR engine contains the following period or any similar case, when settingpartialMatch
to true you'll still get a hit, even if it's only a partial match.caseSensitive
: Toggle case sensitivity when looking for matches. This is another way to deal with eventual inconsistencies in OCR results.
const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
language: Language.German,
});
useAzureVisionOCR();
(async () => {
try {
const location = await screen.find(singleWord("Bestätigen"), {
providerData: {
partialMatch: true,
caseSensitive: true,
}
});
} catch (e) {
console.error(e);
}
})();
Custom OCR confidence value
One way to configure the minimum required confidence value for a match when performing on-screen search is the screen.config.confidence
value. This property was introduced with the initial image search plugin, thus it was exclusively used for image search.
Now that there are additional things to search for on-screen, like text, this single confidence value becomes a bit limiting. In cases where you are using both image and text search you'd want a separate way to configure confidence values used for OCR based searches.
After importing @nut-tree/plugin-azure
you'll have another property at your disposal to configure the confidence value required for text search:
screen.config.ocrConfidence
This value specifies the percentage required for a text search result to be accepted.
const {screen, singleWord} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
language: Language.German,
});
screen.config.ocrConfidence = 0.8;
useAzureVisionOCR();
(async () => {
try {
const location = await screen.find(singleWord("Bestätigen"), {
providerData: {
partialMatch: true,
caseSensitive: true,
}
});
} catch (e) {
console.error(e);
}
})();
Full example
Let's take a look at a full example which brings all previously discussed pieces together. The following sample would demonstrate a hypothetical scenario where we are trying to click a button which is labelled " Bestätigen" in German (that would be "Confirm" in English).
We configure our languageModelType
to the model which delivers the most accurate results, preload German language data, configure a custom OCR confidence value of 80% and run a non case-sensitive search for a singleWord
, allowing for partial matches.
const {getActiveWindow, mouse, screen, singleWord, straightTo} = require("@nut-tree/nut-js");
require('dotenv').config();
const {configure, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
language: Language.German,
});
useAzureVisionOCR();
screen.config.ocrConfidence = 0.8;
screen.config.autoHighlight = true;
(async () => {
const location = await screen.find(singleWord("Bestätigen"), {
providerData: {
partialMatch: true,
caseSensitive: false
}
});
await mouse.move(
straightTo(
centerOf(
location
)
)
);
await mouse.leftClick();
})();