Backus-Naur Form Grammars

November 20, 2023

Using Backus-Naur Form Grammars to Format Large Language Model Outputs

1.1 Introduction

Large Language Models (LLMs) represent a seismic force in artificial intelligence and natural language processing. They are advanced algorithms designed to understand, interpret, and generate human language in a way that is both coherent and contextually relevant. At their core, LLMs are trained on vast datasets of text, enabling them to learn language patterns, nuances, and the complexities of human communication which they use to predict the most likely next word in a sequence of words.

The journey of language models began with simpler models that could handle basic text generation and has since evolved into sophisticated systems with billions of parameters capable of creating text that simulates human writing. This evolution is not just about size but also about the models’ ability to map the subtleties in language, context, and even humor or sarcasm to virtual structures in latent space.

The significance of LLMs in today’s tech landscape cannot be overstated - they are the driving force behind a myriad of applications, with the battle for the killer app working it’s way out in the market even today. But they do have a limitation: their output is not always predictable or controllable. This limitation is particularly pronounced in scenarios where the generated text needs to adhere to specific syntactical or structural rules. In such cases, the volitaility of LLM output can make them difficult to wrangle.

1.2 What is a BNF Grammar?

As a computer scientist, whenever I get the chance to talk about grammars I feel very special, like “Oh, Gödel and I were just talking to old Leibniz the other day about this,” sort of special. Backus-Naur Form (BNF) /ˌbækəs ˈnaʊər/ grammar is a foundational concept in computer science, primarily used for defining the syntax of programming languages, data formats, and communication protocols. It is a domain-specific language specification used to express context-free grammars, which are essential in describing the structure of language elements in computing.

Historically BNF’s applications span official language specifications, manuals, and academic texts on programming language theory. They are widely used, because BNF grammars provide a structured and unambiguous method to describe syntax.

And, it is cute that it can define itself.

1.3 What is llama.cpp and local LLMs?

llama.cpp is a notable project in the world of Large Language Models, initially serving as a port of Facebook’s LLaMA model in C/C++. Its primary goal was to run the LLaMA model using 4-bit integer quantization; the inception of llama.cpp was rather swift, reportedly being “hacked in an evening.” However, it has since evolved significantly, benefiting from numerous contributions by a lot of really smart people.

It’s implementation is impressive, entirely in plain C/C++ without dependencies, the library is highly efficient and has expanded to supporting many different quantization levels and other output shaping features. Particularly interesting are the newest gguf/ggml features, which allow for inference to be performed on the CPU, GPU, or both. (I am able to easily run 7B models at 8-bit quantization on my desktop; while 70B run but, not at a useful speed)

In terms of model support, it supports not only the original LLaMA and LLaMA 2 models but also a wide array of others like Falcon, Alpaca, Mistral and other various international versions of LLaMA/Alpaca, Chinese models like Yi, and many more. It also effectively serves as the axle for the LLM ecosystem, with many other projects using it as a base for their own work.

1.4 Why do I care about BNF grammars?

When you need a specification to unambiguously describe the syntax of a language, BNF grammars are an oldschool, proven method to do so. They offer a way to model natural language data in the manner of a protocol, creating a consistent, reliable structure. In data processing, BNF grammars help in defining the structure of various data formats, ensuring that data is consistently and correctly parsed and processed.

As we delve deeper into the world of artificial intelligence and natural language processing, the need for structured, understandable, and reliable outputs from LLMs increases. BNF grammars can be employed to format and structure the outputs of LLMs, ensuring that the generated text adheres to specific syntactical rules. This application is particularly crucial as we seek to integrate LLMs more seamlessly into various technological solutions, from automated content generation to conversational agents. Understanding BNF grammars, therefore, becomes essential for anyone looking to harness the full potential of LLMs in practical and innovative ways.

For many applications, BNF grammars present an alternative to the traditional method of fine-tuning language models. Instead of retraining the LLM on specific datasets to achieve desired outputs, requiring lots of data and compute to do so, BNF grammars provide a way to impose structure on the outputs of any model. This approach offers significant advantages, particularly in terms of efficiency and flexibility. By using grammars, one can quickly adapt the output structure to different contexts or requirements without the need for extensive retraining.

2.1 Introduction to BNF Grammars

Get ready.

A BNF grammar comprises a set of derivation rules, each defined by a nonterminal symbol, an assignment operator (::=), and an expression.
The expression consists of terminal and nonterminal symbols.
Terminal symbols match some concrete string.

Trees are a good way to visualize the structure of a BNF grammar, with the nonterminal symbols being the branches and the terminal symbols being the leaves.

That’s it really, but that is a lot to unpack. In terms of this article, it’s about 8,500 words, so let’s get started.

2.2 History of Context-Free Grammars and BNF

I guess we can start at the beginning.

The concept of describing language structure using grammatical rules dates back to ancient times, with notable contributions from scholars like the Sanskrit grammarian Pāṇini in the 6th to 4th century BC. Pāṇini’s work laid the foundation for understanding how languages could be broken down into constituent parts and rules, a concept that would much later influence the development of context-free grammars in computing.

In the 20th century, the formal study of language structures took a significant leap forward, particularly in Western linguistics, where linguists began formalizing language descriptions. Concurrently, mathematicians and theorists, including Emil Post, Alan Turing, and Noam Chomsky, were exploring string rewriting rules and formal logical systems, which eventually led to the establishment of context-free grammars. Chomsky’s distinction between generative and transformation rules further refined these concepts, setting the stage for the development of BNF.

BNF, as proposed by John Backus and later refined by Peter Naur, was initially created to describe the syntax of the ALGOL programming language. This development marked a significant milestone in the formalization of language syntax for computing purposes.

2.3 Fundamentals of BNF Grammars

We have a specification, and then we have what the specification creates. Imagine a BNF grammar as a vast, branching tree. The nonterminal symbols are like the main branches of the tree, broad and defining the overall shape. Not the actual text but placeholders for patterns.

Terminal symbols, in contrast, are the leaves of this tree. They are the specific, tangible characters or tokens that appear in the language. When the wind of syntax blows, these leaves might rustle and change (the criteria to match may change), but they remain individually distinct and identifiable.

The rules in BNF grammars are like the smaller branches and twigs that connect leaves to the larger branches. They define how the tree grows, specifying which branches lead to which leaves. A rule in BNF, written as symbol ::= expression, is like saying a main branch (symbol) grows into a series of smaller branches and leaves (expression). The ::= operator is the point of growth, where the branch splits and spreads out.

Some of these branches are recursive, bending back towards themselves like the loops in an Escher drawing. This recursion is a fundamental aspect of BNF grammars, allowing the tree to grow in complex and interesting ways. It lets a branch sprout smaller versions of itself or even of the larger tree, creating a fractal-like pattern within the structure. From a practical perspective, this is a double-edged sword. It allows for a lot of flexibility in the grammar, but it also makes it difficult to understand and debug, and potentially leads to infinite loops.

2.4 Interplay between BNF and LLMs

If we have a grammar as a tree, our LLM as a river of word predictions. This river, fed by the networks of relationships between words, meanders through the latent space of language, sometimes overflowing with a deluge of words and ideas. However, without direction, this flow can be chaotic and unfocused, spilling over in unpredictable ways. This is where grammars come into play, acting as precise channels that guide this torrent into meaningful and structured streams.

BNF grammars serve as a sophisticated system of linguistic canals and waterways, designed to direct the flow of LLM outputs into coherent and purposeful paths. Just as channels in a river guide water to specific locations, BNF-style grammars mold the output of LLMs, ensuring that the words and sentences produced adhere to specific syntactical and structural rules. These grammars can define precise patterns and sequences, funneling the broad possibilities of language generation into outputs with guardrails.

llama.cpp supports a variant of a BNF grammar called GBNF. It has rules very similar to the specification standard, without the need for angle brackets. It also supports a few additional operators, like ?, +, and * (| is a BNF operator). These operators allow for more flexibility in the grammar, and are described in more detail below.

3.1 Introduction to Grammar

The list.gbnf grammar serves as an introductory example to illustrate the basics of GBNF grammars in a practical and straightforward manner. This simple grammar is designed to define the structure of a markdown list.

3.2 Simple List Grammar (`list.gbnf`)

From the llama.cpp grammar examples, list.gbnf:

root ::= item+

# Excludes various line break characters
item ::= "- " [^\r\n\x0b\x0c\x85\u2028\u2029]+ "\n"

This grammar comprises primarily two rules:

The root rule: root ::= item+ defines the top-level structure of the grammar, indicating that the root element consists of one or more item elements. The + symbol signifies that the item can repeat one or more times, making it suitable for lists of varying lengths.

The item rule: item ::= "- " [^\r\n\x0b\x0c\x85\u2028\u2029]+ "\n" specifies what constitutes an item in the list. It begins with a literal - (dash and space), followed by a series of characters that are not line break characters, effectively capturing a single line of text. The \n at the end signifies the end of a list item, indicating that each item is on a separate line.

In practice, this grammar can parse and structure text that follows a simple list format, where each item starts with a dash and a space, and is followed by the list content. Of particular note, the + operator in the root rule allows for more than one list item to be generated, which we will discuss later.

3.3 Results

When the list.gbnf grammar is applied to a text, it effectively organizes and formats the data into a structured list. Let’s consider the following example output:

- This is a list item.
- This is another list item.

In this result, each line of the text that begins with a dash followed by a space (- ) is recognized as a separate item in the list. The presence of the newline character (\n) at the end of each item ensures that each list entry is distinctly separated and placed on a new line. This structured format is a direct outcome of the rules specified in the grammar.

4.1 Symbol ::= Expression

At the heart of GBNF grammars lies the rule structure, typically represented in the format symbol ::= expression. This structure is fundamental to defining the syntax of any language or data format using GBNF. In this format, the symbol represents a nonterminal element in the grammar. It stands for a concept or a category within the language, rather than a specific string of characters.

The ::= operator, read as “is defined as,” links the symbol to its expression. This expression can consist of both terminal and nonterminal symbols. Terminal symbols are the actual characters or tokens of the language, such as words, numbers, or punctuation marks. Nonterminal symbols, on the other hand, are placeholders that refer to other rules within the grammar.

For example, in a grammar defining a simple arithmetic expression, we might have a rule like expression ::= number "+" number. Here, expression is a nonterminal symbol defined by the sequence of two number symbols separated by the terminal symbol +. This rule effectively states that an arithmetic expression can be formed by adding two numbers.

The symbol ::= expression format is the building block of GBNF grammars, allowing for the construction of complex language structures through a series of simple, yet powerful rules. Each rule in a GBNF grammar serves to break down the language into manageable parts, defining how these parts can be combined to form valid statements or constructs.

4.2 Allowed Symbols

The system is rather simple. In GBNF grammars, there are two primary types of symbols that construct the syntax rules: terminal and nonterminal symbols. Each plays a distinct and crucial role in defining the structure of a language.

Terminal symbols are the actual text that appears in the language and are often represented with pattern matches in the GBNF rules. For example, in a grammar defining a programming language, terminal symbols could be keywords like if, else, or symbols like {, }, (, ). As natural language, they could be words, punctuation marks, literal strings, or in GBNF regular expression-like text pattern matches.

Nonterminal symbols, on the other hand, serve as placeholders or references to other rules within the grammar. A nonterminal symbol is defined by as a ’list of terminal and/or other nonterminal symbols’, but they represent patterns that the grammar enforces. For instance, in a rule like sentence ::= subject " " predicate, both subject and predicate are nonterminal symbols that would be further defined elsewhere in the grammar, but " " is a space, and a terminal symbol.

The interplay of terminal and nonterminal symbols is what gives BNF grammars their power and flexibility. By combining these symbols in various ways, one can construct all sorts of complex, modular patterns.

4.3 Parentheses, ?, +, *, and | Operators

Basic BNF grammar doesn’t allow for much flexibility in the rules, only having one or operator |, so many extended grammars have emerged over time. For our case, the GBNF grammar has several operators are used to enhance the flexibility and specificity of grammar rules. These include parentheses (), the optional operator ?, the one-or-more operator +, and the zero-or-more operator *.

Parentheses are used to group symbols and sequences together, much like in mathematical expressions. They determine the order in which the elements of the grammar should be interpreted. For instance, in a rule like sentence ::= (subject verb) object, the parentheses indicate that subject and verb form a unit that precedes object.

The question mark ? denotes that a symbol or group of symbols is optional in the grammar. It means that the element can appear zero or one time. For example, title? in a grammar rule suggests that the title can either be present or absent.

The plus sign + signifies that a symbol or group of symbols can appear one or more times. This operator is used when an element in the language can be repeated but must occur at least once. For instance, item+ would mean that there must be at least one item.

The asterisk * is similar to the plus sign but denotes that the symbol or group can appear zero or more times. This means that the element is completely optional and can repeat an unlimited number of times. A rule like whitespace* allows for any number of whitespace characters, including none.

Lastly, the vertical bar | is used to denote alternatives in a grammar rule. It functions similarly to an “or” in logical expressions. In a rule such as response ::= "yes" | "no", it indicates that the response can be either “yes” or “no”.

These operators add significant expressive power to GBNF grammars, allowing for the concise and precise definition of very complex language structures.

4.4 Root Rule

In the architecture of GBNF grammars, the root rule holds a place of fundamental importance. It serves as the starting point or the entry node of the grammar, defining the overarching structure from which all other rules derive. The root rule sets the stage for how the grammar interprets and structures the language, essentially guiding the entire parsing process.

The root rule is typically the first rule defined in a GBNF grammar and is often representative of the highest-level concept or element in the language. For example, in a grammar designed to parse sentences, the root rule might define a sentence’s structure, which then references other rules for subjects, predicates, and objects.

This rule’s significance cannot be overstated, as it determines the overall framework of the language structure being defined. All other rules in the grammar ultimately connect back to the root rule, creating a hierarchical and interconnected network of rules that define the language’s syntax.

5.1 Introduction to Arithmetic Grammar using `arithmetic.gbnf`

The sample arithmetic grammar from llama.cpp serves as an excellent model to demonstrate the fundamental concepts of a GBNF grammar. The choice of arithmetic is deliberate; it’s a domain familiar to most people and one that follows a set of well-defined and universally understood rules.

The arithmetic.gbnf grammar defines the syntax for basic arithmetic expressions, including addition, subtraction, multiplication, and division. It’s structured in a way that mirrors the logical hierarchy and precedence of arithmetic operations. The grammar includes rules for expressions (expr), terms (term), identifiers (ident), numbers (num), and whitespace (ws), each defined with a specific GBNF rule.

5.2 The sample grammar, `arithmetic.gbnf`

arithmetic.gbnf:

root  ::= (expr "=" ws term "\n")+
expr  ::= term ([-+*/] term)*
term  ::= ident | num | "(" ws expr ")" ws
ident ::= [a-z] [a-z0-9_]* ws
num   ::= [0-9]+ ws
ws    ::= [ \t\n]*

The root rule defines the highest-level structure, indicating that the grammar represents one or more expressions, each followed by an equals sign, whitespace, a term, and a newline. The + signifies that there can be multiple such expressions.

The expr rule defines an arithmetic expression as a term possibly followed by an operator (+, -, *, /) and another term. The use of * indicates that this pattern of operator followed by term can repeat zero or more times.

The term rule allows for a term to be either an identifier (ident), a number (num), or another expression enclosed in parentheses, which enables the representation of nested expressions. This reflects the precedence of operations in arithmetic.

The ident and num rules define what constitutes a valid identifier (a string of letters, digits, and underscores) and a number (a sequence of digits), respectively. The ws at the end of these rules indicates that they can be followed by whitespace.

The ws rule defines whitespace as zero or more space, tab, or newline characters, allowing for flexible formatting within expressions.

Through this grammar we are provided with a clear and structured way to parse and understand arithmetic expressions.

5.3 Results

When the grammar is applied to arithmetic expressions, it structures them in a consistent and logical manner, adhering to the defined rules. Here are five lines of output demonstrating this:

x = 3 + 4
y = x * (2 - 5)
result = y / 2
total_sum = ( result + x ) - y
average = total_sum / 2

In these examples, each line represents an arithmetic expression structured as per the arithmetic.gbnf grammar:

The x = 3 + 4 expression follows the expr rule, combining numbers with the + operator.
The expression y = x * (2 - 5) demonstrates the use of nested expressions within parentheses, as allowed by the term rule.
result = y / 2 illustrates the division operation and the use of previously defined identifiers like y.
In total_sum = (result + x) - y, we see a more complex combination of operations, showcasing the grammar’s ability to handle multiple operations in a single expression.
Finally, average = total_sum / 2 uses an identifier (total_sum) combined with a number and an operator, adhering to the grammar’s structure.

These results exemplify how a grammar can bring order and clarity to arithmetic expressions. Similarly, when applied to LLM outputs, GBNF grammars can structure the generated text in a coherent and contextually appropriate manner, ensuring that the outputs adhere to specific syntactical rules and logical constructs.

6.1 Building the Tree

I used the analogy of a tree, mainly because computers use trees, specifically parse trees, to represent how different elements of a language are hierarchically related based on the GBNF grammar rules. This tree structure allows for the clear visualization of relationships and dependencies among various components of the language.

Each rule contributes to the formation of a symbol tree. The root of this tree is the root rule, branching out to the different expressions defined in the grammar. In arithmetic.gbnf, each expr in the grammar forms a subtree, with branches representing the various operations and terms (term) involved in the expression. The terminal symbols, such as numbers (num) and identifiers (ident), form the leaves of the tree, representing the concrete elements in the expressions.

When the tree is compiled, the compiler traverses this tree, matching the parts of the expression with the appropriate rules in the grammar until a terminator is hit, and then follows every branch in the same manner. For example, in the expression 3 + 4, the compiler would identify 3 and 4 as num (numbers) and + as an operator, organizing them according to the expr rule structure.

6.2 Layering Rules to Build Complex Structures

In creating GBNF grammars, the concept of layering rules is akin to building a structure with multiple levels, where each level adds more detail and depth to the language’s syntax. This layered approach allows for the creation of complex and nuanced language structures, far beyond simple linear constructs.

In our arithmetic.gbnf grammar, a simple layering is evident in how the rules are structured to define arithmetic expressions. The root rule sets the overall framework for an expression. From there, the expr rule layers in the complexity of arithmetic operations, allowing for the combination of terms (term) through various operators (+, -, *, /). Each term can further be broken down into identifiers (ident) or numbers (num), or even nested expressions, adding another layer of complexity.

This method of building upon simpler rules to create more intricate structures is what gives grammars their power and flexibility. For arithmetic expressions it allows for the representation of everything from simple additions to complex nested operations.

6.3 Recursion and Repeating the Root Rule

Recursion is a fundamental concept meaning that something can be defined in terms of itself, allowing for the creation of nested and self-referential structures within a language. In relation to our grammar, recursion enables the grammar to handle expressions within expressions, a common occurrence in arithmetic calculations. This is particularly evident in the term rule, where a term can be an identifier, a number, or another expression enclosed in parentheses. Such recursive rules allow the grammar to parse and represent complex arithmetic expressions that include multiple levels of nested operations.

The repetition of the root rule is another key feature we see. In this grammar, the root rule is defined to repeat one or more times (root ::= (expr "=" ws term "\n")+). This repetition enables the grammar to handle not just a single arithmetic expression but a series of expressions, each followed by an equals sign, whitespace, a term, and a newline. The + at the end of the root rule signifies that this pattern can occur multiple times, allowing for multiple arithmetic expressions to be defined and parsed in sequence.

Since many LLM applications rely on multiple interactions with the user from one session, this rule is what gives us that ability. Now, with all of the features put together, we can define complex grammars that can parse and structure even the most intricate of language constructs.

7.1 Introduction to Dialogue Grammar

Dialogue grammars, as exemplified here by the following dialogue.gbnf, is a grammar designed to structure and interpret dialogues in natural language. It is a more complex grammar that enforces some of the complexities of conversational language, including the nuances of interaction between multiple speakers.

dialogue.gbnf:

# The root rule defines the overall structure of the dialogue, two lines of dialogue, and then the user line
root ::= (
  (single-line single-line user-line)+
)

# The user-line rule defines a line of dialogue from the user, to use as a stop word
user-line ::= user-name ":"

# Update this to your name
user-name ::= "$user"

# The single-line rule defines a single line of dialogue
single-line ::= prefix ws non-verbal ws content ws non-verbal? "\n"

# Prefix with flexible addressing and directives
prefix ::= participant " (to " participant-list ", " directive "):"

# You can have multiple participants
participant ::= ("Val" | "Abner" | "Sarah" | "Tim")
participant-list ::= participant | user-name

# directives can add some extra context to the dialogue to shape the output
directive ::= word

# Non-verbal communication using a pattern
non-verbal ::= "*" [a-z ]+ "*"

# This allows for up to four sentences in a response
content ::= long
long ::= statement sentence? sentence? sentence?

# A sentence is either a statement or a question
sentence ::= ws (statement | question)
statement ::= words "."
question ::= words "?"

# Our base types
words ::= [A-Za-z ,;'"]+
word ::= [A-Za-z]+
slug ::= [a-z_]+
ws ::= [ ,]+

The flow of dialogue in such a grammar is crucial. It must account for the back-and-forth nature of conversation, where context and speaker identities play significant roles. Dialogue grammars, therefore, include rules that not only parse individual sentences but also manage the transitions and connections between different speakers’ lines.

A particular focus here is given to multi-actor dialogues. This means the grammar is designed to handle conversations involving multiple participants, each potentially having distinct ways of speaking and interacting. The grammar includes rules for identifying speakers (participant-list), managing dialogue flow (prefix, directive), and incorporating non-verbal cues (non-verbal). This forces a structure on to the LLM output that encourages it to follow this multi-actor dialogue flow and enhances the content output.

We also see a long rule, which fixes our statements to between one and four. The way it was written was to encourage the LLM to be able to exit early, whereas recursive rules would sometimes run forever, or branching rules caused their own issues. This is a common problem with LLMs, and we have to be careful to avoid it.

7.2 What is Dialogue?

Dialogue, in its simplest form, is a conversation between two or more participants. It is a fundamental aspect of human communication, characterized by the exchange of ideas, information, or sentiments through spoken or written language. Key characteristics of dialogue include turn-taking, where participants alternate in contributing to the conversation, and context-dependency, where the meaning and relevance of each contribution are often shaped by previous exchanges.

Dialogues also involve multiple perspectives, as each participant brings their own background, opinions, and emotional states to the conversation. This multifaceted nature makes dialogues dynamic and complex, requiring careful consideration of tone, intent, and the evolving context of the conversation.

7.3 The Flow of Dialogue

The flow of dialogue refers to the sequential and interactive progression of a conversation. It is not merely an exchange of statements but a dynamic process involving responses, interruptions, affirmations, and sometimes non-verbal cues. This flow is governed by a set of unspoken rules and norms, which guide how participants take turns, respond to each other, and shift the conversation’s focus.

In a way, our grammar is an attempt to encode these rules and norms into a set of instructions that can be used to parse and structure dialogues. The grammar defines rules for how participants take turns (root, prefix), how they respond to each other (directive, sentence), and how they convey non-verbal cues (non-verbal).

As each participant contributes to the dialogue, the grammar shapes and structures their statements. LLMs natually use this type of previous data to inform their output, so this is a natural fit, with the grammar enforcing the otherly unpredictable nature of LLMs to conform to our turn-by-turn interaction, with particiants properly alternating their roles as speaker and listener, each contributing to the conversation in a manner that respects the flow and context.

Accurately capturing and managing this flow of dialogue is often very important. For a chat session, LLMs must be able to not only generate contextually relevant responses but also maintain the continuity of conversation across multiple exchanges. This surprisingly is not a problem for a well-behaved LLM, as they are able to accept information from the flow of dialogue as long as it is consistent and in-context.

That is, up until the point where the conversation falls into a repetition trap; overfit to the context data.

8.1 The Role of Base Types

Base types serve as the fundamental building blocks of language structure. These are the most elemental units of syntax that form the foundation upon which more complex structures are built. In terms of dialogue, grammars like dialogue.gbnf, base types are crucial as they define the basic components of conversation, such as individual words and phrases.

In dialogue.gbnf, base types include elements like words and word. These types represent the simplest forms of language units - individual words or groups of words - that can be combined and structured to form sentences and dialogue. For example, the rule words ::= [A-Za-z ,;'"]+ defines a sequence of alphabetic characters, spaces, and punctuation marks, which collectively represent a spoken or written phrase in the dialogue.

Base types allow for more modularity and flexibility in the grammar. By defining the most basic elements of language, these types can be combined in various ways to create complex structures without having to repeat the same rules over and over.

8.2 The Role of the Sentence, Statement, and Question in Building Content

We have to have a fundamental unit that represents an utterance. The sentence structure plays this pivotal role, serving as the basic framework upon which dialogues are constructed. Sentences in dialogues can take various forms, primarily statements and questions, each adding a unique dimension to the conversation.

statements in dialogue represent assertions or declarations, conveying information, opinions, or thoughts from the speaker. They are crucial for imparting facts or sharing viewpoints, forming the backbone of most conversational exchanges. questions, on the other hand, introduce an element of inquiry into the dialogue, inviting responses, clarifications, or further discussion.

We get a lot of mileage out of these two sentence types, but there are many more. Beyond these basic forms, we could define many others reflecting the many modes of human communication, or impose structure with question words (Who, What, et al.). This could inspire rhetorical questions that don’t seek literal answers, exclamatory sentences that express strong emotions, or imperative sentences that give commands or requests.

Always remember that every word shifts the context of the conversation, so adding in special words shapes everything that comes after.

8.3 The Role of the Prefix in Controlling Flow

When beginning a line, the prefix rule plays a crucial role in controlling and directing the flow of dialogue. This rule is designed to provide context and clarity at the beginning of each line of dialogue, identifying who is speaking and setting the tone or direction of their speech.

A prefix grammar generally include the speaker’s name or identifier, and it may also incorporate additional directives or context-setting elements. For example, a prefix might specify the speaker’s mood, their relationship to the listener, or their location, providing valuable context that shapes how the rest of the dialogue is interpreted. Our prefix is structured as participant " (to " participant-list ", " directive "):", which clearly identifies the speaker and the intended listener, along with any descriptive information about the nature of the statement.

This rule is what we use to manage the flow of dialogue, enabling multi-actor scenarios where multiple participants are involved. By clearly delineating who is speaking to whom and under what circumstances, the prefix rule helps to prevent confusion and maintain the coherence of the conversation, facilitating a natural progression of the conversation.

8.4 The Role of Single-Line and User-Line Structure Rules

The single-line and user-line rules are integral in shaping the structure of turns. The single-line rule defines the format of a typical line of dialogue from any participant, encapsulating the speaker, their message, and any non-verbal cues. This rule is mode 1.

The user-line rule serves as a stop word or a marker, indicating the points in the dialogue where the user is expected to contribute. This rule is a bit of our important magic, as it allows us to control the flow of the dialogue, and is used in our script to trigger a stop condition. This rule is mode 2.

These rules demonstrate two distinct modes of conversation in the grammar: the first mode, where any participant can speak, and the interactive mode, which specifically involves the user. This distinction is vital in applications involving LLMs, as it allows for the creation of controllable dialogues that can adapt to different scenarios and participants.

The flexibility of the grammar also suggests the potential for introducing other completely different modes of conversation. For instance, a mode focused on internal monologue could be introduced, or a mode for group discussions with specific dynamics, or even the ability to output commands or tokens back to the application. Each mode could have its own set of rules tailored to its specific requirements; limited only by their system prompt or fine-tuned guidance.

9.1 Enforcing a Specific Dialogue Structure

The structure of dialogue is a critical element in ensuring that conversations are coherent and contextually appropriate. For Large Language Models the application of a grammar like dialogue.gbnf is helpful for enforcing a specific structure on the dialogue outputs, which can otherwise be chaotic and fall out of coherence.

The dialogue.gbnf grammar enforces structure by defining clear rules for each component of a dialogue. These rules dictate how sentences are formed, how turns are taken, and how context is conveyed. This ensures that each line of dialogue generated by the LLM adheres to a predictable and logical format, closely mimicking the structure of natural human conversations.

The benefits of this approach in LLM outputs are manifold. Structured dialogues are inherently more coherent, making them easier for users to follow and engage with. They also allow for greater control over the flow of conversation, ensuring that dialogues progress in a controlled manner.

9.2 Adding Context to Dialogue Outputs

Context in dialogue is what breathes life into conversations, making them more than just exchanges of words. It involves the background information, emotional undercurrents, and situational nuances that give depth and meaning to what is being said. Embedding this context into dialogue outputs is crucial for creating interactions that are not only coherent but also resonate with users on a deeper level.

To guide context in dialogue outputs, we incorporate rules that define not just the structure of the sentences but also the nuances of communication. For example, the grammar includes directives and non-verbal cues, which add layers of meaning beyond the literal words. directives can provide information about the speaker’s mood or attitude, while non-verbal cues can convey emotions or reactions, adding subtlety and depth to the dialogue.

When these contextual elements are integrated into LLM-generated dialogues, the conversations become more realistic and engaging. Users interacting with AI systems powered by LLMs can experience dialogues that feel natural and responsive to the context of the conversation, even layering in elements of emotion.

The ability to structure and control dialogue flow using rules like prefix and non-verbal are really the secret sauce for getting a LLM in the right frame of mind to generate relevant content. Not only does it shape the output that is being currently generated, it enhances the state of the next output, and the next, and so on.

Incorporating context into dialogue outputs thus enhances the overall effectiveness of conversational AI. It enables LLMs to produce outputs that engage in a way that reflects an understanding of the situation and the emotional tone of the conversation. This leads to more satisfying and meaningful interactions, improving engagement and trust in AI-driven communication platforms.

9.3 Enforcing Stop Word Conditions

In almost every LLM usecase, there needs to be a trigger to hand control back or signify that a stop condition has been reached. For dialogue, stop word conditions occur at the end of turns, when a user’s input or response is expected or required.

These conditions are particularly important in interactive dialogues, such as those encountered in chatbots, virtual assistants, or roleplay, where the flow of conversation involves active participation from the user. The previous grammar accommodates these conditions by defining specific rules that indicate a pause or a shift in the speaker, and also providing for an explicit condition that is used in our script to trigger a stop condition, ensuring the right format for the trigger; although the trigger still has to be enacted by the LLM.

10.1 Introducing the Script

To tie it all together, here we have a simple bash script that runs the llama.cpp main program. I put my models in a relative path, but you can set them to wherever yours are located.

#!/bin/bash
MODEL_FILE=speechless-mistral-dolphin-orca-platypus-samantha-7b.Q8_0
MODEL="-m ./gguf/$MODEL_FILE.gguf"
#CTX=16384
CTX=8192
NGL=25
USER='$user'
PROMPT_FILE=office
GRAMMAR_FILE=dialogue

if [[ $GRAMMAR_FILE != "" ]]; then
    GRAMMAR="--grammar-file grammar/$GRAMMAR_FILE.gbnf"
fi
if [[ $PROMPT_FILE != "" ]]; then
    PROMPT="-f prompts/$PROMPT_FILE.txt"
fi

REVERSE="-r $USER:"
ARGS='--mirostat 2 --interactive-first'

./main -n -1 -c $CTX -ngl $NGL --repeat_penalty 1.2 --color -i \
       $PROMPT $REVERSE $MODEL $GRAMMAR $ARGS | tee office.log

10.2 Configuration

The script begins by setting up the model file (MODEL_FILE), which specifies the particular LLM to be used. These can be found on huggingface.

For GPU, set NGL > 0. This parameter pertains to the GPU layers used for processing, affecting the performance and efficiency of the model. Adjusting the NGL layers can have a significant impact on how the model generates text, particularly in terms of processing speed and resource utilization, and on GPU memory utilization, which must be split between the layers and the context.

The script also includes provisions for integrating the grammar file (GRAMMAR_FILE) and the prompt file (PROMPT_FILE). The grammar file, in this case, dialogue, is integral as it provides the structure and rules that the LLM must adhere to in its output. The prompt file office, on the other hand, offers a system prompt for the dialogue, setting the rules and sample formatting.

10.3 System Prompt

We use a system prompt to set the rules and sample formatting for the dialogue. This prompt is defined in the office file, which is referenced in the script. The prompt file is structured as follows:

office.txt:

### Instructions:
#define Val { a=female b=28 c=thoughtful d=positive e=curious f=assistant h=human }
#define Abner { a=male b=53 c=mindful d=communicator e=creative f=lead h=human }
#define Sarah { a=female b=36 c=smart d=witty e=bossy f=manager h=human }
#define Tim { a=male b=22 c=shy d=thoughtful e=smart f=engineer h=human }
#define Val { full name Valerie Wilson, has a bachelor's degree in physics and works as $user's research associate.  Val's interests include reading books (lots of them), hiking, video gaming, and watching YouTube videos.  Val feels respected; is helpful, kind, honest, good at writing, and never fails to answer the $user's requests immediately and with precision. }
#define Abner { full name Abner Lamonte, has a bachelor’s degree in business administration and works as $user's manager.  Abner's interests include birdwatching and Torah study.  Abner is a good listener and is always willing to help. }
#define Sarah { full name Sarah Jones, has a master’s degree in management information systems and works as $user's product manager.  Sarah's interests include watching movies, yoga, and shopping.  Sarah is direct and to the point. }
#define Tim { full name Tim Smith, has a bachelor’s degree in electrical engineering and works as $user's engineer.  Tim's interests include open source projects and cryptocurrency meetups.  Tim uses math and logic to solve problems. }
#define Output { Only use factual information that you have been given. Do not make up any information. }
### Input (turn format):
$speaker: The speaker makes a statement.
$agent (to $speaker, $reaction_to_speaker): $agent makes a clever and insightful response without argument.
$reactor (to $agent, $reaction_to_agent): $reactor responds to what $agent just said.
### Input (action format):
$user: Good morning, Val.
Val (to $user, excited): Oh! Good morning, $user!
Sarah (to Val, surprised): You seem happy today.
### Response:
$user:

There are several parts to a good system prompt, which the authorship of is an art in and of itself.

Firstly, you need some sort of header. Alpaca formatted headers are a pretty standard choice, so we use the ### symbol to set our state. Since programming languages are generally trained into models, in this particular prompt I’m playing with using #define name { definition } to input our state.

Next comes our one-shot turn formatting. This gives the model a bit of a clue on how a turn is expected to go, which should align with our GBNF file.

Finally, our response block seeds the model with the first line of dialogue.

It’s a decent system prompt and provides enough to get us to where we want to be.

10.4. Running the script.

Here is an output of the script running with the dialogue.gbnf grammar and the office prompt file. The model used is speechless-mistral-dolphin-orca-platypus-samantha-7b.Q8_0, which is a 7B parameter Mistral model that someone did a massive merge of several major model branches into.

Okay, I want to talk about Sam Altman being fired.
Val (to $user, thoughtful): What are your thoughts on this situation, and how it might affect the company's performance in the future. Let me know if you need any help with that research. 
Abner (to Val, mindful): This is a very interesting issue to investigate. I recall Sam Altman being an important part of this particular industry. It would be wise for us to analyze his impact and potential implications on company performance in the future. Let's dive into it together. 
$user: He was the CEO of OpenAI, and the chief scientist led the board in ousting him and the chairman last Friday.
Sarah (to $user, witty): I know that the company's cofounders are taking over as interim leaders after this turmoil began with some internal conflicts. 
Tim (to Sarah, smart): That's right, Sarah. It now seems like a mutiny against Sam Altman and Greg Brockman by other executives at OpenAI. Interesting findings indeed. Let us supervise the situation closely to avoid any negative consequences for our company. 
$user: What do you think could have motivated the board to take such drastic action?
Val (to $user, hushing): Let's first understand why Sam Altman was ousted before we get into speculating. We should check the source of this information for accurate details on the turmoil at OpenAI. I am eager to help you gather more data about this situation and provide insights based on that information. 
Abner (to $user, communicator): Yes, Val makes a great point. It's important not to jump into conclusions until we have verified facts in hand. This turnover of leadership can potentially strain the working environment within OpenAI. We should keep an eye on it and see how our company is affected by these changes. 
$user: So, apparently Microsoft is upset and several investors are urging the board to resign and allow Altman back as the CEO.
Sarah (to $user, direct): Well, it seems that the risk of not reinstating Sam might be greater than anticipated in terms of investor relations. We should keep monitoring any potential consequences for our company's future partnership with OpenAI or Microsoft using this data, Valerie. 
Tim (to Sarah, engineer): Yes, Sarah has a point here. This situation is quite complex to predict the outcomes accurately and it requires careful examination while considering all possible scenarios. Let us follow up on any updates related to Sam Altman and the board members' decisions for our own interests. 
$user:

Without the grammar, this conversation would have not followed the same alternating-turn structure. By limiting the number of sentences, forcing the prefix, and allowing for consecutive agent lines, along with our speaking to prefix, we can get multi-agent conversations that also regularly pass control back to the user, as the agents each still talk to each other.

If you notice that there are no non-verbal cues, it’s because I didn’t have it in the system prompt, so the LLM didn’t think about it. I will leave that one for you, dear reader.

11.1 Advantages of Using GBNF Grammars in LLMs

I think we have pretty adequately demonstrated the advantages of using Backus-Naur Form grammars in LLMs, but let’s recap.

One of the most appealing aspects of GBNF grammars is their relative simplicity and user-friendliness. Constructing a GBNF grammar does not require extensive programming skills; rather, it calls for an understanding of the language’s syntax and structure. This accessibility makes GBNF grammars a popular choice among linguists, data scientists, and AI developers who seek to shape and structure the outputs of LLMs without delving into the complexities of programming languages or algorithm design.

In addition to their simplicity, GBNF grammars are highly effective tools for controlling the outputs of LLMs. They allow for the precise definition of language structures, ensuring that the generated text adheres to specific syntactic rules and patterns. This control is crucial in nearly every application that uses LLMs.

The practical implications are significant. The ease of writing GBNF grammars means that they can be quickly developed and implemented, allowing for rapid prototyping and testing in various LLM applications. The precision and control offered by these grammars ensure that LLM outputs are not only syntactically correct but also contextually appropriate.

This wasn’t the first time I had experimented with multi-agent dialogue, but with GBNF grammar it was pretty simple.

11.2 The Awesome Use Cases for GBNF Grammars in LLMs

The application for applied GBNF grammars in LLMs is huge, with there being some immediate use cases that come to mind.

Firstly, GBNF grammars are exceptional at formatting LLM output that the LLM itself uses. Without having to rely on external data sources to come up with in-line/in-context sentiment or descriptive capability of the model allows for a LLM to dig within itself to expand it’s own context, such as with the directive rule in dialogue.gbnf.

Secondly, they are better than fine tuning for formatting LLM outputs into structured data formats like JSON and CSV. In JSON, for instance, GBNF grammars can dictate the hierarchical organization of data, ensuring that the LLM outputs comply with the nesting and attribute requirements of JSON format. Similarly, for CSV, GBNF grammars can be designed to ensure that LLM outputs are organized into well-defined rows and columns, adhering to CSV conventions. This capability is invaluable in scenarios where LLMs are used for data generation, extraction, or transformation tasks.

We demonstrated an example of how they an be useful in dialogue generation, but they can also be used to generate code, legal documents, and academic writing. For example, with code generation, GBNF grammars can guide LLMs in generating syntactically correct programming code, by simply enforcing the language syntax rules.

Thirdly, they can be used to control LLM outputs in a way that allows structured data to be returned to the application. By fixing the outputs into a low-cardinality set, it turns it from more of a hack to something that is reliable and useable (using |’d sets).

This means that LLMs can be used to generate decisions that can be used to trigger actions. For example, By making available ot an agent some structured commands, an agent could write a command string to it’s output. The application engine can interpret that command and search a database or api based on the request, and reintroduced the results to the agent, could resume generation allowing the agent to make a decision based on the new information.

11.3 Limitations of Using GBNF Grammars in LLMs

While GBNF grammars offer substantial benefits in structuring LLM outputs, they come with their own set of limitations. Understanding these constraints is crucial for effectively leveraging these tools in natural language processing.

One of the inherent limitations of GBNF grammars lies in their structured nature, which can sometimes be too rigid to capture the fluidity and ambiguity inherent in natural language. While they excel in defining clear, rule-based syntax, GBNF grammars may struggle to adequately represent language nuances, or become extremely complex when attempting to do so. This rigidity can lead to outputs that, while syntactically correct, might lack the natural flow or adaptability seen in human language. Even in the output of the dialogue.gbnf grammar we can see the output is not as natural as we would like, although some of that is also due to the model.

When using llama.cpp, integrating GBNF grammars with LLMs also presents performance-related challenges. Depending on the complexity and depth of the grammar, there can be issues related to processing speed, computational resource demands, and the efficiency of generating outputs. These performance issues become particularly pronounced when dealing with intricate grammars or large-scale LLMs, where the computational overhead of parsing and applying complex grammatical rules can be substantial.

Whenever you narrow the output of a LLM, you are also narrowing the scope of the model, which can lead to overfitting at a higher rate. You can see this in the dialogue.gbnf output, where the model is overfitting to the context of the conversation, and is not able to generate new information. This is a common problem with LLMs, and more advanced techniques are required to overcome it.

There are also subtle issues around how the grammars are written that can affect the performance of the model, or how the application utilizes the grammar to make choices. So, while GBNF grammars are powerful tools for structuring LLM outputs, their limitations in terms of natural language representation and performance constraints must be acknowledged. Balancing the benefits and drawbacks of these grammars is key to maximizing their effectiveness in various natural language processing applications.

11.4 Tips

So, some of the gotchas that I ran into while writing this article.

You can’t use an underscore in a symbol. You can use a dash, but no underscore.
The logic around ()’s is a bit funky, and I believe is being actively worked on as of b1539. If you are having problems compiling, try removing them, or adding them.
Work one line at a time, otherwise compilation errors can be hard to track down.
A crashed grammar will make it so your bash prompt no longer accepts input. You can fix this by running stty sane in your terminal.

12.1 Conclusion

So, that’s it. We’ve covered a lot of ground, and I hope you’ve learned something.

We’ve seen firsthand how grammars aren’t just theoretical constructs for lingusits and academics; they have practical and powerful applications, especially when it comes to structuring and managing the outputs of LLMs. The non-trivial use case we examined – shaping dialogue outputs – demonstrates the value and complexity of what GBNF grammars bring to the table. Multi-turn outputs are a serious challenge for LLMs, and GBNF grammars did it, no problem.

I am certianly going to be baking GBNF grammars into valai. I can think of a thousand ways that I can use them to bring crazy powerful features to bear with LLMs; both in how both how we interact with them, and how they interact with us. One little step closer to AGI.

13.1 Bonus

ChatML is a really popular format for training models that focus on dialogue. It’s also a format that I have historically found difficult to control over long sessions. Using a GBNF grammar to control the output of a ChatML model is a really powerful way to get the best of both worlds.

So, here’s one that I wrote as a test, and it works better than expected.

root ::= (user-end line line user-start)+
line ::= (start-line "\n" message "\n" end "\n")

start-line ::= start speaker ", to " spoken ", " directive

user-start ::= start user
user-end ::= end "\n"

directive ::= "happy" | "sad" | "angry" | "neutral" | "confused" | "excited" | "frustrated" | "bored" | "surprised" | "disgusted" | "afraid" | "calm" | "content" | "disappointed" | "embarrassed" | "lonely" | "proud" | "relieved" | "satisfied" | "scared" | "shy" | "sick" | "tired" | "worried" | "annoyed" | "determined" | "envious" | "grateful" | "hopeful" | "impressed" | "jealous" | "nostalgic" | "prepared" | "reassured" | "sentimental" | "sympathetic" | "triumphant" | "ashamed" | "awed" | "devastated" | "drained" | "excited" | "exhausted" | "furious" | "guilty" | "indignant" | "interested" | "loved" | "mad" | "optimistic" | "peaceful" | "refreshed" | "rejuvenated" | "relaxed" | "renewed" | "restless" | "satisfied" | "shocked" | "sickened" | "sleepy" | "stressed" | "surprised" | "thankful" | "thoughtful" | "uncomfortable" | "upset" | "weird" | "accomplished" | "aggravated" | "amused" | "anxious" | "apathetic" | "awake" | "bouncy" | "busy" | "cheerful" | "chipper" | "cold" | "complacent" | "confused" | "contemplative" | "cranky" | "crappy" | "crazy" | "creative" | "crushed" | "curious" | "depressed" | "disappointed" | "discontent" | "ditzy" | "dorky" | "drained" | "drunk" | "ecstatic" | "energetic" | "enthralled" | "exanimate" | "excited" | "exhausted" | "flirty" | "frustrated" | "full" | "geeky"

user ::= "$user"
speaker ::= ("Val" | "Sarah" | "Tim" | "Abner")
spoken ::= (user | speaker)

message ::= [^"\n"]+
start ::= "<|im_start|>"
end ::= "<|im_end|>"

and the prompt.

<|im_start|>system
#define Val { a=female b=28 c=thoughtful d=positive e=curious f=assistant h=human }
#define Abner { a=male b=53 c=mindful d=communicator e=creative f=lead h=human }
#define Sarah { a=female b=36 c=smart d=witty e=bossy f=manager h=human }
#define Tim { a=male b=22 c=shy d=thoughtful e=smart f=engineer h=human }
#define Val { full name Valerie Wilson, has a bachelor's degree in physics and works as $user's research associate.  Val's interests include reading books (lots of them), hiking, video gaming, and watching YouTube videos.  Val feels respected; is helpful, kind, honest, good at writing, and never fails to answer the $user's requests immediately and with precision. }
#define Abner { full name Abner Lamonte, has a bachelor’s degree in business administration and works as $user's manager.  Abner's interests include birdwatching and Torah study.  Abner is a good listener and is always willing to help. }
#define Sarah { full name Sarah Jones, has a master’s degree in management information systems and works as $user's product manager.  Sarah's interests include watching movies, yoga, and shopping.  Sarah is direct and to the point. }
#define Tim { full name Tim Smith, has a bachelor’s degree in electrical engineering and works as $user's engineer.  Tim's interests include open source projects and cryptocurrency meetups.  Tim uses math and logic to solve problems. }
#define Output { Only use factual information that you have been given. Do not make up any information. }
<|im_end|>
<|im_start|>$user
Good morning, Val.
<|im_end|>
<|im_start|>Val, to $user, excited
Oh! Good morning, $user!
<|im_end|>
<|im_start|>Sarah, to Val, surprised
You seem happy today.
<|im_end|>
<|im_start|>$user

the script.

#!/bin/bash
MODEL_FILE=dolphin-2.2.1-ashhlimarp-mistral-7b.Q8_0
MODEL="-m ../llama/gguf/$MODEL_FILE.gguf"
CTX=8192
NGL=25
PROMPT_FILE=officeml
GRAMMAR_FILE=chatml
USER='|>$user'

if [[ $GRAMMAR_FILE != "" ]]; then
    GRAMMAR="--grammar-file grammar/$GRAMMAR_FILE.gbnf"
fi
if [[ $PROMPT_FILE != "" ]]; then
    PROMPT="-f prompts/$PROMPT_FILE.txt"
fi

REVERSE="-r '$USER'"
ARGS='--mirostat 2 --interactive-first'

./main -n -1 -c $CTX -ngl $NGL --repeat_penalty 1.2 --color -i \
       $PROMPT $REVERSE $MODEL $GRAMMAR $ARGS | tee chatml.log