---
source_path: "models/types/lvcsr.md"
canonical_url: "https://doc.sensory.com/tnl/7.9/models/types/lvcsr/"
---

# LVCSR _(TrulyNatural only)_

These recognizers use a phonetic acoustic model and an FST vocabulary decoder.
They are suitable for small to medium vocabulary tasks, but not for
unconstrained audio transcription.

These models have [task-type](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#task-type)` == `[lvcsr](https://doc.sensory.com/tnl/7.9/api/setting-keys/values.md#lvcsr) and filenames that
by convention match `lvcsr-*.snsr`

You can create LVCSR recognizers with [VoiceHub](https://doc.sensory.com/tnl/7.9/reference/voicehub.md#voicehub) or by
[specifying a grammar](https://doc.sensory.com/tnl/7.9/reference/grammar.md#grammar-based-recognition) with build-capable[^1] model.

LVCSR recognizers include support for decoding with statistical [language models],
but Sensory does not distribute the tools used to create these[^2]. Language models can
provide improved accuracy for constrained target domains. _For transcription type
tasks, an STT model is a better fit._

The Sensory FST decoder supports hybrid models that contain both grammar-based and language model components.

**Also see these related items:** [LVCSR models](https://doc.sensory.com/tnl/7.9/models/index.md#lvcsr-models) included in this distribution.

[^1]: LVCSR models created by [VoiceHub](https://doc.sensory.com/tnl/7.9/reference/voicehub.md#voicehub) include build components only if the grammar references
at least one user-defined class, such as `~dynamic-1`. If the grammar contains no unresolved classes
VoiceHub removes the build components to reduce model files size and RAM use.
[^2]: Contact your [sales representative](https://doc.sensory.com/tnl/7.9/contact.md#sales) if you would like to explore using a custom language model
for your application.

## Operation

```mermaid
flowchart TD
    start((start))
    fetch[/samples from ->audio-pcm/]
    audio(^sample-count)
    process
    partial(^result-partial)
    intent(^nlu-intent)
    slot(^nlu-slot)
    result(^result)
    nlu{NLU<br>match?}
    start --> fetch
    fetch --> audio
    audio --> process
    process --> fetch
    process -->|hypothesis| partial
    partial --> fetch
    process -->|VAD endpoint<br>or STREAM_END| nlu
    nlu -->|yes| intent
    nlu -->|no| result
    intent --> slot
    slot --> result
    slot -->|more| intent
    result --> fetch
```

Recognition flow.

1. Read audio data from [->audio-pcm](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#-audio-pcm).
2. Invoke [^sample-count](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#sample-count-event).
3. Invoke [^result-partial](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#result-partial) with interim recognition hypotheses
   every [partial-result-interval](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#partial-result-interval) ms.
5. Continue processing until [STREAM_END](https://doc.sensory.com/tnl/7.9/api/inference.md#rc_stream_end) occurs on [->audio-pcm](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#-audio-pcm),
   one of the event handlers returns a code other than [OK](https://doc.sensory.com/tnl/7.9/api/inference.md#rc_ok), or
   an external [VAD](https://doc.sensory.com/tnl/7.9/api/setting-keys/values.md#vad) detects a speech endpoint.
6. If NLU is configured, invoke [^nlu-intent](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#nlu-intent) and [^nlu-slot](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#nlu-slot) for each
   top-level result that matches.
7. Invoke [^result](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#result) with the final recognition hypothesis.
8. Resume processing from step 1.

**Note:**

LVCSR recognizers do **not** produce a final recognition hypothesis until they
run out of audio samples to process, or an external VAD detects a speech
endpoint.

With live audio you should use these with a VAD template such as
[tpl-vad-lvcsr](https://doc.sensory.com/tnl/7.9/models/index.md#tpl-vad-lvcsr), [tpl-opt-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.9/models/index.md#tpl-opt-spot-vad-lvcsr), or [tpl-spot-vad-lvcsr](https://doc.sensory.com/tnl/7.9/models/index.md#tpl-spot-vad-lvcsr).

## Settings

**Available events:** [^nlu-intent](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#nlu-intent), [^nlu-slot](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#nlu-slot), [^result](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#result), [^result-partial](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#result-partial), [^sample-count](https://doc.sensory.com/tnl/7.9/api/setting-keys/events.md#sample-count-event)

**Available iterators:** _none_

**Available results:** [audio-stream](https://doc.sensory.com/tnl/7.9/api/setting-keys/results.md#audio-stream), [audio-stream-first](https://doc.sensory.com/tnl/7.9/api/setting-keys/results.md#audio-stream-first), [audio-stream-last](https://doc.sensory.com/tnl/7.9/api/setting-keys/results.md#audio-stream-last)

**Available runtime settings:** [->audio-pcm](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#-audio-pcm), [audio-stream-from](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#audio-stream-from), [audio-stream-to](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#audio-stream-to), [grammar-stream](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#grammar-stream), [phrases-stream](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#phrases-stream)

**Available configuration settings:** [ac-prune-top-k](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#ac-prune-top-k), [audio-stream-size](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#audio-stream-size), [complete-only](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#complete-only), [partial-result-interval](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#partial-result-interval), [ram-limit](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#ram-limit), [samples-per-second](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#samples-per-second), [search.frame-nota](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#searchframe-nota), [show-silence](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#show-silence)

**Available values:** [lvcsr](https://doc.sensory.com/tnl/7.9/api/setting-keys/values.md#lvcsr)

**Also see these related items:** [live-spot.c](https://doc.sensory.com/tnl/7.9/api/sample/c/live-spot.md#live-spot-code), [snsr-eval.c](https://doc.sensory.com/tnl/7.9/api/sample/c/snsr-eval.md#snsr-eval-code), [PhraseSpot.java](https://doc.sensory.com/tnl/7.9/api/sample/android/enroll-trigger.md#et-code), [segmentSpottedAudio.java](https://doc.sensory.com/tnl/7.9/api/sample/java/segmentSpottedAudio.md#segmentspottedaudio-code)

## Notes

Sensory optimizes hybrid models with a background component only to detect speech that is not in
the specified grammar. These models report an [nlu-intent-name](https://doc.sensory.com/tnl/7.9/api/setting-keys/results.md#nlu-intent-name) of `background` when they detect
out-of-grammar utterances. You should not use the out-of-grammar recognition [text](https://doc.sensory.com/tnl/7.9/api/setting-keys/results.md#text) result
as this will have a high word error rate. Consider using [STT](https://doc.sensory.com/tnl/7.9/models/types/stt.md#stt-type) for transcription tasks instead.

[phrases-stream](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#phrases-stream) provides a convenient way to specify a recognition vocabulary from an exhaustive
list of alternative utterances.

## LVCSR grammar-based recognition

Sensory's LVCSR models use [grammars](https://doc.sensory.com/tnl/7.9/reference/grammar.md#grammar-syntax) to constrain the
possible utterances they can recognize. Focusing on a limited set of words and
structures defined in these grammars improves recognition speed and accuracy at
the expense of recognizing arbitrary input.

You can create a custom recognizer by specifying a fixed grammar during development if
the recognition vocabulary is entirely known, or at runtime if it is not. You can also
use a hybrid approach and build the invariant parts during development, and delay
adding [variable parts](https://doc.sensory.com/tnl/7.9/models/types/lvcsr.md#grammar-classes) (such as a list of favorite TV channels) until runtime.

See [Grammar-based recognition](https://doc.sensory.com/tnl/7.9/reference/grammar.md#grammar-based-recognition) for the shared grammar syntax, operator
precedence, NLU markup, and special symbols supported by grammar-based LVCSR and
STT recognizers.

### Creating a recognizer

Create a grammar-based recognizer using the [command-line tools](https://doc.sensory.com/tnl/7.9/tools/index.md#command-line-tools).
This example uses _data/grammars/enrollments.txt_ which contains a sample grammar specification for
the enrollment recordings in _data/enrollments/_.

To create a custom recognizer using this grammar with [snsr-edit](https://doc.sensory.com/tnl/7.9/tools/snsr-edit.md#snsr-edit),
specify an LVCSR model that supports building and [grammar-stream](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#grammar-stream).

### Details: _data/grammars/enrollments.txt_

```
# LVCSR grammar specification for test utterances in data/enrollments/
#
# In a tpl-spot-vad-lvcsr pipeline the prefix would be consumed by the spotter.
prefix = armadillo | jackalope | terminator;

# List of known utterances in the *-c.wav files.
sentence =
 18 percent of 643 |
 call the nearest target |
 how far away is winco |
 play more songs by this artist |
 record a video |
 start a timer for 20 minutes |
 i'm running low on gas |
 cancel all my meetings on friday |
 directions to susan's house |
 do i have any new texts |
 open my calendar to next week |
 set an alarm for 6 am tomorrow;

# Match the prefix and zero or one of the sentences.
# <s> and </s> are sentence start and end markers that
# match silence and small amounts of extraneous speech.
g = <s> $prefix $sentence? </s>;
```

```console
% cd $HOME/Sensory/TrulyNaturalSDK/7.9.0-pre.0+19.ged1a5d37de

% bin/snsr-edit -vv -t model/lvcsr-build-enUS-14.0.2-5MB.snsr \
    -f grammar-stream data/grammars/enrollments.txt \
    -o lvcsr-enrollments.snsr
Loading "model/lvcsr-build-enUS-14.0.2-5MB.snsr" as the template model.
Loading "data/grammars/enrollments.txt" into setting "grammar-stream".
Saved edited model to "lvcsr-enrollments.snsr".
```

Run the new model with [snsr-eval](https://doc.sensory.com/tnl/7.9/tools/snsr-eval.md#snsr-eval):

```console
% bin/snsr-eval -t lvcsr-enrollments.snsr \
    -s partial-result-interval=0 \ # (1)!
    data/enrollments/armadillo-1-3-c.wav
   165   2745 armadillo play more songs by this artist
```

1. [partial-result-interval](https://doc.sensory.com/tnl/7.9/api/setting-keys/configuration.md#partial-result-interval)` = 0` shows only the final recognition hypothesis.

For small grammars such as this the build time is negligible. [snsr-eval](https://doc.sensory.com/tnl/7.9/tools/snsr-eval.md#snsr-eval) can
build and run the recognizer in a single operation:

```console
% bin/snsr-eval -t model/lvcsr-build-enUS-14.0.2-5MB.snsr \
    -f grammar-stream data/grammars/enrollments.txt \
    -s partial-result-interval=0 \
    data/enrollments/armadillo-1-3-c.wav
   165   2745 armadillo play more songs by this artist
```

### Classes

A symbol that starts with the tilde `~` sigil specifies a [recognition class](https://doc.sensory.com/tnl/7.9/reference/grammar.md#grammar-syntax-class).
Class recognizers have their own grammar specifications, separate from the top-level
grammar. The behavior of a class-based recognizer is similar to that specified
by a rule. Classes, however, can be updated without recompiling the rest of the grammar,
and all references to a class use the same recognizer. This can reduce the recognizer size
and improve build speed.

This example uses a modified enrollment grammar which references two toy
classes: `~number` and `~place`:

**`enrollments-class.txt`**

```
# LVCSR grammar specification for test utterances in data/enrollments/
# This references two class sub-recognizers: ~number and ~place
#
# In a tpl-spot-vad-lvcsr pipeline the prefix would be consumed by the spotter.
prefix = armadillo | jackalope | terminator;

# List of known utterances in the *-c.wav files.
sentence =
 ~number percent of ~number |
 call the nearest ~place |
 how far away is ~place |
 play more songs by this artist |
 record a video |
 start a timer for ~number minutes |
 i'm running low on gas |
 cancel all my meetings on friday |
 directions to ~place |
 do i have any new texts |
 open my calendar to next week |
 set an alarm for ~number am tomorrow;

# Match the prefix and zero or one of the sentences.
# <s> and </s> are sentence start and end markers that
# match silence and small amounts of extraneous speech.
g = <s> $prefix $sentence? </s>;
```

**`place.txt`**

```
# Example place name class recognizer.

g = target | winco | susan's house;
```

The `~number` and `~place` classes referenced in _enrollments-class.txt_
create two new dynamic settings for these classes: `grammar-stream.number` and
`grammar-stream.place`. Specify these to create a complete recognizer:

```console
% snsr-edit -v -t model/lvcsr-build-enUS-14.0.2-5MB.snsr\
    -f grammar-stream enrollments-class.txt \
    -g grammar-stream.number "g = 18 | 643 | 20 | 6;" \ # (1)!
    -o lvcsr-enrollments-class.snsr
Output written to "lvcsr-enrollments-class.snsr".
```

1. [snsr-edit](https://doc.sensory.com/tnl/7.9/tools/snsr-edit.md#snsr-edit)'s `-g` option sets the [grammar-stream](https://doc.sensory.com/tnl/7.9/api/setting-keys/runtime.md#grammar-stream)`.number` stream to a string argument. A file can also be used for the number grammar.

Run the recognizer:

```console
% snsr-eval -v -t lvcsr-enrollments-class.snsr \
    -s partial-result-interval=0 \
    data/enrollments/armadillo-1-0-c.wav
   375   3150 (1.863e-08) armadillo 18 percent of 643
```

### Class libraries

TrulyNatural 6.15.0 introduced support for pre-built binary class repositories.
These contain classes built from frequently used grammar fragments such as dates, times, and numbers.

Class libraries are supported by LVCSR models only. Load binary class
repositories into the same [Session](https://doc.sensory.com/tnl/7.9/api/inference.md#session) as an LVCSR model to add this capability
to the model. If a grammar references a class that's not explicitly defined,
the class name is looked up in the provided class library or libraries. System
class libraries provided by Sensory use a prefix of `s.` for all class names.

See [lvcsr-lib-enUS-14.0.2.snsr](https://doc.sensory.com/tnl/7.9/models/index.md#lvcsr-lib-enUS) for a description of the classes used below.

**`class-lib.txt`**

```
# Example recognizer with classes from a class library
call = call {number ~s.phone-number};
emergency = ~s.call-emergency;
timer = {timer ~s.timer-phrases};
commands = {call} | {emergency} | $timer;
g = <s> $commands </s>;
```

This example uses live audio, so it needs [snsr-eval](https://doc.sensory.com/tnl/7.9/tools/snsr-eval.md#snsr-eval)'s `-a` flag
to add a [VAD](https://doc.sensory.com/tnl/7.9/models/tpl/tpl-vad-lvcsr.md#tpl-vad-lvcsr-type) to find the end of each utterance and signal
the recognizer to produce a final hypothesis.

```console
% snsr-eval -a -t model/lvcsr-build-enUS-14.0.2-5MB.snsr \
    -t model/lvcsr-lib-enUS-14.0.2.snsr \
    -f grammar-stream class-lib.txt \
    -s partial-result-interval=0

# Say: Call 1 800 555 1212
NLU intent: call (0) = call one eight hundred five five five one two one two
NLU entity:   number (0) = one eight hundred five five five one two one two
  3360   6855 call one eight hundred five five five one two one two

# Say: Set a timer for 31 minutes.
NLU intent: timer (0) = set a timer for thirty one minutes
 14610  16770 set a timer for thirty one minutes

# Say: Call the fire department.
NLU intent: emergency (0) = call the fire department
 24540  25890 call the fire department
```

<!-- tab: c -->

**C/C++**

Configuring class-based recognition with the C API:
```c
SnsrSession s;

snsrNew(&s);
snsrLoad(s,   snsrStreamFromFileName("model/tpl-vad-lvcsr-3.17.0.snsr", "r"));
snsrSetStream(s, SNSR_SLOT_0,
              snsrStreamFromFileName("model/lvcsr-build-enUS-14.0.2-5MB.snsr", "r"));
snsrLoad(s,   snsrStreamFromFileName("model/lvcsr-lib-enUS-14.0.2.snsr", "r"));
snsrSetStream(s, SNSR_GRAMMAR_STREAM,
              snsrStreamFromFileName("class-lib.txt", "r"));
if (snsrRC(s) != SNSR_RC_OK) {
    fprintf(stderr, "ERROR: %s\n", snsrErrorDetail(s));
    return snsrRC(s);
}
```
<!-- /tab -->

<!-- tab: java -->

**Java**

Configuring class-based recognition with the Java API:
```java
SnsrSession s = new SnsrSession();
try {
    s.load(SnsrStream.fromFileName("model/tpl-vad-lvcsr-3.17.0.snsr", "r"));
    s.setStream(Snsr.SLOT_0,
                SnsrStream.fromFileName("model/lvcsr-build-enUS-14.0.2-5MB.snsr", "r"));
    s.load(SnsrStream.fromFileName("model/lvcsr-lib-enUS-14.0.2.snsr", "r"));
    s.setStream(Snsr.GRAMMAR_STREAM,
                SnsrStream.fromFileName("class-lib.txt", "r"));
} catch (IOException e) {
    e.printStackTrace();
    return s.rC();
}
```
<!-- /tab -->

<!-- tab: py -->

**Python**

Configuring class-based recognition with the Python API:
```python
try:
    with snsr.Session() as s:
        s.load("model/tpl-vad-lvcsr-3.17.0.snsr")
        s.set_stream(
            snsr.SLOT_0,
            snsr.Stream.from_filename("model/lvcsr-build-enUS-14.0.2-5MB.snsr", "r"),
        )
        s.load("model/lvcsr-lib-enUS-14.0.2.snsr")
        s.set_stream(
            snsr.GRAMMAR_STREAM,
            snsr.Stream.from_filename("class-lib.txt", "r"),
        )
except snsr.Error as e:
    print(f"ERROR: {e.message}")
```
<!-- /tab -->

<!-- Abbreviation definitions from includes/abbreviations.md -->
*[API]: Application Programming Interface
*[FST]: Finite-State Transducer
*[LVCSR]: Large Vocabulary Continuous Speech Recognition model, feed-forward neural net acoustic model with FST decoder
*[NLU]: Natural Language Understanding model
*[RAM]: Random Access Memory
*[STT]: Speech To Text: transformers with language model and CTC decoding
*[TNL]: TrulyNatural, Sensory's large-vocabulary speech recognition technology
*[VAD]: Voice Activity Detector
