OCR plugins
@nut-tree/plugin-azure
Installation
npm i @nut-tree/plugin-azure
Buy
@nut-tree/plugin-azure
is included in the Solo and Team plans.
Prerequisites
In order to use @nut-tree/plugin-azure
, you need to have an Azure account and an Azure AI Vision OCR resource. You can use the free pricing tier (F0) to try the service, and upgrade later to a paid tier for production.
Once you have both things set up, you'll need a key and the endpoint of the resource you created to connect your application to the Azure AI Vision service:
- After your Azure Vision resource deployed, select
Go to resource
. - In the left navigation menu, select
Keys and Endpoint
. - Copy one of the keys and the endpoint.
- Use them in your code via e.g. environment variables.
Description
@nut-tree/plugin-azure
is a nut.js plugin which currently integrates the Azure AI Vision OCR service with nut.js. It provides an implementation of the TextFinderInterface to perform on-screen text search. Additionally, it provides a plugin that extends the nut.js Screen with the ability to extract text from screen regions.
Configuration
@nut-tree/plugin-azure
is designed to provide multiple subpackages which will provide their own configuration options. Currently, there is only one subpackage available, @nut-tree/plugin-azure/ocr
, which provides the ability to perform on-screen text search.
You can either use the separate configuration methods of each subpackage, or use the configure()
method of the main package to configure all subpackages at once.
interface AzurePluginConfiguration {
ocr: AzureVisionOCRApiConfiguration;
}
configure()
Configure the plugin by providing an AzurePluginConfiguration
. This config object holds all configuration options for all subpackages.
Azure AI Vision OCR
@nut-tree/plugin-azure/ocr
uses the Azure AI Vision OCR service to perform OCR.
Configuration
The subpackage comes with the following configuration options:
interface AzureVisionOCRApiConfiguration {
apiEndpoint: string;
apiKey: string;
checkResultInterval?: number;
checkResultRetryCount?: number;
language?: string;
modelVersion?: string;
readingOrder?: AzureOcrServiceReadingOrder;
}
AzureVisionOCRApiConfiguration.apiKey
Your api key for the Azure AI Vision OCR resource. This is a required option to use the Azure AI Vision OCR service.
AzureVisionOCRApiConfiguration.apiEndpoint
The URL of your Azure AI Vision OCR resource. This is a required option to use the Azure AI Vision OCR service.
AzureVisionOCRApiConfiguration.modelVersion
@nut-tree/plugin-azure/ocr
allows you to explicitly specify which of the available models to use. If you don't specify a model version, the latest model will be used.
AzureVisionOCRApiConfiguration.language
@nut-tree/plugin-azure/ocr
allows you to explicitly specify a single language to use for OCR. By default, the service will extract all text, including mixed language, so if you want to force usage of a single, specific language, you can do so by setting this option.
Available languages are:
export enum Language {
Afrikaans,
Albanian,
Angika,
Arabic,
Asturian,
AwadhiHindi,
Azerbaijani,
Bagheli,
Basque,
BelarusianCyrillic,
BelarusianLatin,
BhojpuriHindi,
Bislama,
Bodo,
Bosnian,
Brajbha,
Breton,
Bulgarian,
Bundeli,
Buryat,
Catalan,
Cebuano,
Chamling,
Chamorro,
Chhattisgarhi,
ChineseSimplified,
ChineseTraditional,
Cornish,
Corsican,
CrimeanTatar,
Croatian,
Czech,
Danish,
Dari,
Dhimal,
Dogri,
Dutch,
English,
Erzya,
Estonian,
Faroese,
Fijian,
Filipino,
Finnish,
French,
Friulian,
Gagauz,
Galician,
German,
Gilbertese,
Gondi,
Greenlandic,
Gurung,
HaitianCreole,
Halbi,
Hani,
Haryanvi,
Hawaiian,
Hindi,
HmongDaw,
Ho,
Hungarian,
Icelandic,
InariSami,
Indonesian,
Interlingua,
Inuktitut,
Irish,
Italian,
Japanese,
Jaunsari,
Javanese,
Kabuverdianu,
KachinLatin,
KangriDevanagiri,
KarachayBalkar,
KaraKalpakCyrillic,
KaraKalpakLatin,
Kashubian,
KazakhCyrillic,
KazakhLatin,
Khaling,
Khasi,
Kiche,
Korean,
Korku,
Koryak,
Kosraean,
Kumyk,
KurdishArabic,
KurdishLatin,
KurukhDevanagiri,
KyrgyzCyrillic,
Lakota,
Latin,
Lithuanian,
LowerSorbian,
LuleSami,
Luxembourgish,
MahasuPahari,
Malay,
Maltese,
Malto,
Manx,
Maori,
Marathi,
Mongolian,
MontenegrinCyrillic,
MontenegrinLatin,
Neapolitan,
Nepali,
Niuean,
Nogay,
NorthernSami,
Norwegian,
Occitan,
Ossetic,
Pashto,
Persian,
Polish,
Portuguese,
Punjabi,
Ripuarian,
Romanian,
Romansh,
Russian,
Sadri,
Samoan,
Sanskrit,
Santali,
Scots,
ScottishGaelic,
Serbian,
Sherpa,
Sirmauri,
SkoltSami,
Slovak,
Slovenian,
Somali,
SouthernSami,
Spanish,
Swahili,
Swedish,
Tajik,
Tatar,
Tetum,
Thangmi,
Tongan,
Turkish,
Turkmen,
Tuvan,
UpperSorbian,
Urdu,
Uyghur,
UzbekArabic,
UzbekCyrillic,
UzbekLatin,
Volapuk,
Walser,
Welsh,
WesternFrisian,
YucatecMaya,
Zhuang,
Zulu
}
AzureVisionOCRApiConfiguration.readingOrder
@nut-tree/plugin-azure/ocr
allows you to explicitly specify the reading order to use for OCR. The default is AzureOcrServiceReadingOrder.BASIC
, which will use a left-to-right reading order. AzureOcrServiceReadingOrder.NATURAL
will use a more natural reading order, but this is only available for latin languages.
AzureVisionOCRApiConfiguration.checkResultInterval
@nut-tree/plugin-azure/ocr
submits async jobs to the Azure AI Vision OCR service and then polls for the result. To avoid depleting your API quota, you can configure the polling interval.
AzureVisionOCRApiConfiguration.checkResultRetryCount
@nut-tree/plugin-azure/ocr
submits async jobs to the Azure AI Vision OCR service and then polls for the result. To avoid depleting your API quota, you can configure a maximum number of polls.
Usage: On-screen text search
Let's dive right into an example:
const {centerOf, getActiveWindow, mouse, screen, singleWord, straightTo} = require("@nut-tree/nut-js");
const {configure} = require("@nut-tree/plugin-azure/ocr");
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
});
(async () => {
const location = await screen.find(singleWord("WebStorm"));
await mouse.move(
straightTo(
centerOf(
location
)
)
);
})();
As you can see, the minimal configuration for @nut-tree/plugin-azure/ocr
only requires you to provide your Azure Vision AI OCR API key and endpoint, which are read from environment variables in this case.
That's all you need to search for text on your screen using text queries. singleWord
is one of the currently supported text queries, which are singleWord
and textLine
. singleWord
will search for a single word, while textLine
will search for a while line of text.
ProviderData
You can pass a ProviderData
object to screen.find
to override the configuration options for text search on a per-call basis.
The TextFinderConfig
is defined as follows:
interface TextFinderConfig {
apiEndpoint?: string;
apiKey?: string;
caseSensitive?: boolean,
checkResultInterval?: number;
checkResultRetryCount?: number;
language?: string;
modelVersion?: string;
partialMatch?: boolean;
readingOrder?: AzureOcrServiceReadingOrder;
}
As you can see, you're able to override the global configuration options for @nut-tree/plugin-azure/ocr
on a per-call basis. This allows you to use different endpoints, languages or models for different calls to screen.find
, overriding the global configuration.
You can also tweak some text search related options on a per-call basis:
TextFinderConfig.caseSensitive
@nut-tree/plugin-azure/ocr
will perform case-insensitive text search by default. Toggle this flag to enable case-sensitive text search.
TextFinderConfig.partialMatch
@nut-tree/plugin-azure/ocr
will search for an exact match by default. Toggle this flag to enable partial text matches.
Usage: On-screen text extraction
Just as with Usage: On-screen text search, we'll start with an example:
const {getActiveWindow, screen} = require("@nut-tree/nut-js");
const {configure, TextSplit, useAzureVisionOCR} = require("@nut-tree/plugin-azure/ocr");
configure({
apiKey: process.env.VISION_KEY,
apiEndpoint: process.env.VISION_ENDPOINT,
});
useAzureVisionOCR();
const activeWindowRegion = async () => {
const activeWindow = await getActiveWindow();
return activeWindow.region;
}
(async () => {
const text = await screen.read({searchRegion: activeWindowRegion(), split: TextSplit.WORD});
})();
screen.read
uses the same configuration as screen.find
.
Additionally, screen.read
supports a set of configuration options for text extraction, passed via ReadTextConfig
:
interface ReadTextConfig {
apiEndpoint?: string;
apiKey?: string;
checkResultInterval?: number;
checkResultRetryCount?: number;
language?: Language,
modelVersion?: string;
readingOrder?: AzureOcrServiceReadingOrder;
searchRegion?: Region | Promise<Region>,
split?: TextSplit,
}
As you can see, you're able to override the configuration options for text extraction on a per-call basis. This allows you to use different endpoints, languages or models for different calls to screen.read
, overriding the global configuration.
Additionally, you can pass a searchRegion
to screen.read
, which will be used to limit the screen area to extract text from.
The split
option allows you to configure the level of detail for text extraction. With the default, TextSplit.NONE
, a single block of text which contains all extracted text will be returned.
TextSplit
TextSplit
is an enum that defines how the extracted text should be split:
enum TextSplit {
WORD,
LINE,
NONE
}
This allows to configure the level of detail for text extraction. TextSplit.LINE
will split the result at line level, TextSplit.WORD
on word level and so on.
The default value is TextSplit.NONE
, which will return the extracted text as a single block of text.
Depending on the configured text split, the result of screen.read
is one of the following types:
interface WordOCRResult {
text: string,
confidence: number,
}
interface LineOCRResult {
text: string,
confidence: number,
words: WordOCRResult[],
}
interface BlockOCRResult {
text: string,
confidence: number,
lines: LineOCRResult[],
}
Buy
@nut-tree/plugin-azure
is included in the Solo and Team plans.
Which OCR package should I choose?
As always in software, it depends :)
The obvious difference is that the @nut-tree/plugin-ocr
package is a standalone package, while the @nut-tree/plugin-azure
package is a wrapper around the Azure Cognitive Services API. This means that with @nut-tree/plugin-ocr
you can use the OCR functionality without involvement of any third-party service, while @nut-tree/plugin-azure
requires a (free) Azure subscription.
With @nut-tree/plugin-ocr
, OCR is performed locally on your machine, so no data will be sent to any third-party service. This might be a requirement for some use-cases. On the other hand, @nut-tree/plugin-azure
offers a more powerful OCR engine, which performs better on complex images and has a higher accuracy, even in cases of low-quality images.