Constraint Grammar can count!
It’s not immediately obvious how to even approach this question, as constraint grammar doesn’t generate strings per se. It simply constrains existing, ambiguous strings. We took the following approach: we view a constraint grammar as a formal language
The specification of CG3 mentions tags such as EXTERNAL
, which passes information to an external command. So constraint grammar is obviously Turing complete. However, that’s a little bit boring, so let’s see what we can say about the expressiveness of the absolute core of constraint grammar: REMOVE
with sections. If we leave out sections, there is no recursion, and therefore the language will be strictly finite and boring. If we leave out REMOVE
then there is no way to restrict strings, so we’d only have the languages
There are a few concessions we will allow ourselves. If we had MAP
, ADD
, or any such other command, we would have a way to store information. In this strict fragment, all we have is the current set of symbol assignments. Therefore, we will allow ourselves a second alphabet APPEND
to add these hidden characters, but we would like to stay as faithful to our fragment as possible.
One last hurdle is that constraint grammar has no notion of failure. The worst that can happen is that a grammar changes nothing. Worse so, if there is only one reading left, the REMOVE
command will have no effect. So one more concession we make is that we allow ourselves to use the REMCOHORT
command—which removes an entire “cohort”, or “position” in our terminology—for the sole purpose of deleting the entire string if it is not accepted.
From here on out, when we say ‘CG3’, we are referring to this fragment of constraint grammar.
CG3 is not regular; the language
In this section we show that CG3, restricted to sections and REMOVE
is not regular. We show this by implementing a grammar for the counting language
The first thing we do is to try and detect the edges of the string. CG3 has “magical” constants for this, called >>>
and <<<
for the left and right edge, respectively. However, we cannot use those. Instead, we define them ourselves using two hidden variables, which we also call >>>
and <<<
. We do this as follows:
SET ANY = A OR B;
BEFORE-SECTIONS
REMOVE >>> (-1 ANY);
REMOVE <<< ( 1 ANY);
Initially, all positions will be labeled with both >>>
and <<<
. These above rules check whether there is any position preceding or succeeding the current position, and if so, delete >>>
or <<<
. As a result, the first position will be the only one tagged >>>
, and the last the only one tagged <<<
.CG3’s magic constants are just outside of the string, whereas ours are right at the edge of the string. Therefore, all indices using magic constants are moved by one.While we use BEFORE-SECTIONS
and AFTER-SECTIONS
throughout this post, their usage is not strictly necessary. The grammar also works if everything is executed under a single SECTION
.
Next, we note that all strings in the language EVEN
and ODD
. We can use these to label whether a position is even or odd: we know the first position is odd, so we delete EVEN
; we know that positions following odd positions must be even, so we delete ODD
; and we know that positions following even positions are ODD
, so we delete EVEN
…Note that LINK
is a conjunction, but one in which indices in the second argument are interpreted from the perspective of the position matched in the first.
BEFORE-SECTIONS
REMOVE EVEN (0 >>>);
SECTION
REMOVE ODD (NOT 0 >>> LINK NOT -1 EVEN);
REMOVE EVEN (NOT -1 ODD);
It’s exactly this “marking as even by deleting odd” that makes it a bit of a confusing read, so if you’d like to play around with an example, my full code with examples is available here, and vislcg3 is available here.
Anyway, after performing this labelling, we can check if the last position is even, and if so, delete all positions:We have chosen to describe faillure by outputting the empty string. If we would have been more careful, we could have added a dedicated symbol for failure. However, under our current definitions we compare languages minus the empty string.
AFTER-SECTIONS
REMCOHORT ANY (1* <<< LINK NOT 0 EVEN);
REMCOHORT <<< (NOT 0 EVEN);
Now that we are certain that we only accept even-length strings, it is safe to say that the first symbol must be an SELECT
, since it is equivalent to calling
REMOVE
with the complement—i.e. remove everything but its
argument.
BEFORE-SECTIONS
SELECT A (0 >>>);
SELECT B (0 <<<);
And now it’s only a matter of slowly growing these OPT_A
), and do likewise for the last position before the first definite SELECT A
has no effect if REMOVE A
has no effect if
SECTION
REMOVE OPT_B (-1C A);
REMOVE OPT_A ( 1C B);
SELECT A (NOT 0 OPT_B);
SELECT B (NOT 0 OPT_A);
The grammar described so far exactly expresses the language
CG3 is not context-free; the language
In this section we show that CG3, restricted to sections and REMOVE
is not context-free. We show this by implementing a grammar for the counting language
The language X1
, X2
and X3
:
SET X1_OR_X2 = X1 OR X2;
SET X2_OR_X3 = X2 OR X3;
SET X3_OR_X1 = X3 OR X1;
BEFORE-SECTIONS
REMOVE X2_OR_X3 (0 >>>)
SECTION
REMOVE X3_OR_X1 (NOT 0 >>> LINK NOT -1 X2_OR_X3)
REMOVE X1_OR_X2 (NOT 0 >>> LINK NOT -1 X3_OR_X1)
REMOVE X2_OR_X3 (NOT 0 >>> LINK NOT -1 X1_OR_X2)
AFTER-SECTIONS
REMCOHORT ANY (1* <<< LINK NOT 0 X3)
REMCOHORT <<< (NOT 0 X3)
Note that, somewhat counterintuitively, REMOVE X1_OR_X2
When we write X_OR_Y
, this means that we have defined a
“set” as SET X1_OR_X2 = X1 OR X2;
. The reason for this is that
CG3 does not allow the inline use of set primitives. removes both X1
and X2
, but 0 X1_OR_X2
matches if the current position still has either option.
Now that we can be sure that our string is of some length SELECT
, as using this would erase all other tags. For this, we assume four new hidden symbols FST
, SND
—for first and second half—and OPT_*
varieties:
SET NOT_FST = OPT_FST OR SND OR OPT_SND ;
SET NOT_SND = FST OR OPT_FST OR OPT_SND ;
BEFORE-SECTIONS
REMOVE NOT_FST (0 >>>)
REMOVE NOT_SND (0 <<<)
SECTION
REMOVE OPT_SND (-1 FST LINK (NOT 0 NOT_FST))
REMOVE OPT_FST ( 1 SND LINK (NOT 0 NOT_SND))
REMOVE NOT_FST (0 FST LINK 0 SND LINK 0 OPT_FST LINK NOT 0 OPT_SND)
REMOVE NOT_SND (0 FST LINK 0 SND LINK 0 OPT_SND LINK NOT 0 OPT_FST)
Once we’ve divided the word in half, it becomes fairly easy to point out the middle. Below, we mark the first position as
SET OPT_A_OR_B = (OPT_A OR OPT_B);
SET OPT_B_OR_C = (OPT_A OR OPT_B);
SET OPT_C_OR_D = (OPT_A OR OPT_B);
BEFORE-SECTIONS
REMOVE OPT_B_OR_C (0 >>>)
REMOVE OPT_A_OR_B (0 <<<)
SECTION
REMOVE OPT_C_OR_A (0 FST LINK 1 SND LINK NOT 0 FST)
REMOVE OPT_C_OR_A (0 SND LINK -1 FST LINK NOT 0 SND)
And finally, we grow
SECTION
REMOVE OPT_B_OR_C (-1C A)
REMOVE OPT_A_OR_B ( 1C C)
SELECT A (0 OPT_A LINK NOT 0 OPT_B_OR_C)
SELECT C (0 OPT_C LINK NOT 0 OPT_A_OR_B)
REMOVE OPT_C_OR_A ( 1C B)
REMOVE OPT_C_OR_A (-1C B)
SELECT B (0 OPT_B LINK NOT 0 OPT_C_OR_A)
REMOVE OPT_B_OR_C (-1C A)
REMOVE OPT_A_OR_B ( 1C C)
SELECT A (0 OPT_A LINK NOT 0 OPT_B_OR_C)
SELECT C (0 OPT_C LINK NOT 0 OPT_A_OR_B)
The grammar described so far exactly expresses the language
Beyond Context-Free
It seems pretty obvious that a language formalism whose only construct has the power to observe all of its surrounding context ends up being at least context-sensitive. I could continue. It is still fairly straightforward to generate the language
So for now, let’s leave it at this. I’m a little bored of programming CG at any rate. If you want to have a go, my full code and examples are available here, and vislcg3 is available here.