PhonoApps

How to Derive!

Derive! is an interactive tool designed to apply phonological rules to a given input, loosely based on sed. It runs in-browser, but could be saved onto your computer (using “Save Page As”), whereupon online access is no longer required. Derive! comes with its own feature table and a list of sample transformation rules. Users have the option of customizing both the feature table and any transformation rule, or creating their own from scratch. The software is not difficult to operate with some practice and is meant to be of use to all members of the public with an interest in phonology, regardless of qualification. The target audience is beginning to intermediate phonology students.

How it Works

Derive! takes an underlying representation and applies a series of transformation rules to it to produce a surface representation as its output. Transformation rules have the form φψ/λ___ρ, which means that a string in φ is transformed into ψ whenever it is directly between a string in λ and a string in ρ. Only φ and ψ are necessary for a rule to take place, and either λ or ρ or both may be left blank if necessary. These definitions are often based on phonological features, which are defined by default for International Phonetic Alphabet characters, although a custom feature table can be used to allow the use of any set of single-character segments.

Rules are applied in order, with the output of a preceding rule serving as the input of the following one, and the first rule receiving its input from the UR textbox at the top of the page. It is possible to reorder rules by pressing the ↑ and ↓ buttons, which place a rule at the top or bottom of the list respectively. Rules can also be removed by pressing the “—“ button. New blank rules can be added to the end of the list using the floating button at the bottom or stock rules can be added from the sample rule list below. All rules are open to editing, regardless of origin. Editing a rule is as simple as substituting φ, ψ, λ, and ρ with the desired definitions.

Derive defines the sets φ, λ, and ρ using regular expressions in either SPE-like (default) or POSIX syntax (see below for differences between the two), which both support the use of feature classes. The substitution ψ is defined as a template consisting of literal segments, references to parts of match and contexts, and feature projections, which apply features (with the states either specified or sampled from the match and contexts) to the match, literal text, or parts of the contexts.

If a rule is syntactically valid, the text boxes will be green and the rule will be displayed below (in either SPE or traditional mathematical notation, depending on which syntax is active). If the rule is invalid, the textboxes will turn red and the rule will be skipped. The output of each rule is displayed below it, with replacements highlighted (red if the match was deleted without replacement, green if the match was replaced by itself in a vacuous application, and blue otherwise). The devotional table shows each rule and its output in a format suitable for quick reference or printing, without any of the editing controls.

When Derive! applies a rule, it initiates a search from the beginning of the string for a match of λφρ (if multiple matches begin in the same place, the longest one wins) and then replaces φ with the result of applying ψ to the match. The search for a match then begins again at the end of the replaced text (or one character over if the match consisted of only a right context) and the procedure is repeated until no more matches are found and the end of the string is reached. In this way, the rule applies multiple times from left to right, although it never applies to its own output. In order to produce a cascading effect (such as the sample Voicing Assimilation rule, where a whole row of segments gets a feature from the last one), a quantifier can be used in φ so that the rule can match a multi-segment string (see below for necessary syntax).

Note that the SPE-like syntax of Derive! rules is similar, but not identical to SPE. One difference is related to the so-called alpha notation in feature changing rules. Derive! does not use or recognize Greek letters. Instead, reference to specific segments must use Arabic numbers. For instance, [3continuant] refers to the value of [continuant] for the third element in the rule; the numbering of elements is revealed by hovering over the elements. See example rules on the main page. Another difference refers to metathesis, where the segment order can be referred to by numbers. See the Metathesis rule on the main page.

SPE-like (Default) Regular Expression Syntax

Regular expressions are a way of defining sets of strings by building them from up from the base cases of literal segments and feature classes using the operations of set union, string concatenation, and repetition. Expressions are entered in a serialized version of SPE syntax and displayed below in the full 2D syntax. We say that a regular expression R matches the string s if s contained in the set defined by R. For more detailed examples, add and inspect some of the sample rules.

Base cases

Each of the following regular expressions matches the empty string or strings consisting of a single segment.

Entered Displayed Description
Matches the empty string, as is frequently required in left and right contexts.
a a A segment (or any other character except whr those in \()[]{},*+∅#ΣXCV!₀₁₂₃₄₅₆₇₈₉⁰¹²³⁴⁵⁶⁷⁸⁹) which is not otherwise defined here) matches itself. To match one of the metacharacters, preceed it with a backslash
# # Matches a word boundary (either a space between wordfs or the begining or end of the line).
[+coronal -sonorant] [+coronalsonorant] A space-separated list of features surrounded by [ and ] match any segment in the class defined by those features. Features not found in the table will be ignored.
X Σ Matches any character except for word boundaries.
C C Matches any consonant. Equivalent to [-syllabic].
V V Matches any vowel. Equivalent to [+syllabic].

Operations

Let R and S be regular expressions and let m and n be numbers.

Entered Displayed Description
RS or {RS} (RS) Concatenation: Matches a string matched by R followed by a string matched by S. Brackets are required if a an operation other than union or further concatenation is to be applied to the result.
{R,S} (R|S) Set union: Matches a string matched by either R or S. If R or S are concatenations, they do not rerquire extra btackets.
Rₘ or Rₘⁿ Rm or Rmn Matches concatenations of between m and n strings matched by R. m must be formed out of the subscript digits ₀₁₂₃₄₅₆₇₈₉ and n muse be mase of the superscript digits ⁰¹²³⁴⁵⁶⁷⁸⁹. If n is omitted, the maximum number of repeats is unlimited.
(R) (R) Matches the empty string or a string matched by R, equivalent to R01
R* or R R0 Matches a run of zero or more strings matching R.
R+ or R R1 Matches a run of one or more strings matching R.
R! Rn Marks a sub-expression so that the text it matches may be referred to in the substitution template. The central match, contexts, and all marked subexpressions are automatically numbered from left to right and these numbers may be used in substitution templates to refer back to them. Hover over an expression to view its number.

POSIX Regular Expression Syntax

By changing an option in the configuration section, it is possible to enter and display rules using POSIX Extended Regular Expression syntax, the same syntax as used in grep -E, with two differences. The first significant difference is that character classes are not the standard POSIX ones (alpha, digit, etc.) and are instead space-separated lists of features from the feature table. In addition, the metacharacter . does not match spaces.

The syntax is defined as follows and more complex examples Examples of the syntax in action can be found by selecting any of the sample rules (the regexes appear as editable textboxes).

Base cases

Each of the following regexes matches the empty string or strings consisting of a single segment.

POSIX Traditional Description
blank ε An empty regex matxhes the empty string, as is frequently required in left and right contexts.
a a A segment (or any other character which is not otherwise defined here) matches itself.
space # Matches a word boundary (either a space between words or the beginning or end of the line).
. Σ Matches any character other than a space.
[abc] {a,b,c} A group of characters in square brackets matches any of bracketed characters.
[[:+coronal -sonorant:]] [+coronalsonorant] A space-separated list of features surrounded by [[: and :]] match any segment in the class defined by those features. The usual POSIX character classes do not work unless the feature table used happens to define them (the default one does not). Features not found in the table will be ignored.
\C C Matches any consonant. Equivalent to [[:-syllabic:]].
\V V Matches any vowel. Equivalent to [[:+syllabic:]].

Combinations

Regexes can be combined using the following operations to match longer strings. For the purposes of this table, let R and S be regular expressions and let m and n be numbers. Parentheses may be omitted if the result is unambiguous with operator precedence as ordered in this table.

POSIX Traditional Description
(R) Rn Parenthesizes a sub-expression (to be followed by quantifiers) and marks it so that the text it matches may be referred to in the substitution template. The central match, contexts, and all marked subexpressions are automatically numbered from left to right and these numbers (displayed as subscripts) may be used in substitution templates to refer back to them.
RS (RS) Concatenation: Matches a string matched by R followed by a string matched by S.
R|S (R|S) Set Union: Matches a string matched by either R or S.
R? R? Matches the empty string or a string matched by R. R must either be a base case or parenthesized.
R* R* Matches a run of zero or more strings matching R. R must either be a base case or parenthesized.
R+ R+ Matches a run of one or more strings matching R. R must either be a base case or parenthesized.
R{n} Rn Matches a run of exactly n stings matched by R. R must either be a base case or parenthesized.
R{n,} R[n,) Matches a run at least n stings matched by R. R must either be a base case or parenthesized.
R{m,n} R[m,n] Matches a run of between m and n stings matched by R. R must either be a base case or parenthesized.

Substitution Template Syntax

A substitution template is a string consisting of segment characters, expression references, and feature assimilations. Segments are entered literally and insert themselves.

Expression references are entered as expression numbers preceded by a backslash (such as \2) and insert the text matched by the referenced expression. These are useful for transposition rules.

Feature Assimilations are of the forms [feats] or [S|feats], where S is either a segment or expression number, and feat is a space-separated list of features (if S is omitted, the assimilation operates on the central match). These find the closest segment to S in the class defined by feats, with distance defined as the number of differing features, which essentially applies feats to S. If S is the number of an expression which matched multiple characters, this is done for each character. In addition to the +, −, and 0 states, the state of a feature in feats can be replaced by an expression number, in which case, the state of that feature will be copied from the first character of the text matched by the referenced expression.

As an example, The Voicing Assimilation rule [sonorant]+2[2|3voice]/1___[sonorant]3 matches runs of 2 or more [−sonorant] segments, and applies the voice state of the last one (the right context) to the others (the central match).

How to Customize

The customizable feature table decides which segment receives which feature, and can be edited in CSV format using the marked textbox. The columns of the table represent different segments, while the rows different features, with the first row and column consisting of headers. The intersection of a segment column and feature row should contain + if the segment has that feature, - if the segment does not, or 0 if the distinction does not apply. By substituting one of these three values into a cell, users have the option of changing the feature affinity of segments. Altering the number of features is as easy as deleting or adding a row to the table. Users also have the option of creating their own tables from scratch and simply pasting them into the textbox.

Derive! will ignore any row (and the feature it defines) containing a number of cells different that that of the header row. Therefore, it is important to count cells carefully. Derive! will revert to the default table if the header row, defining for segment names (which are truncated to their first character), is empty or contains duplicate entries. This will also happen if the feature table does not contain any valid rows defining features. For the regex classes C and V to work, the feature table must contain a feature called syllabic with consonants having the − state and vowels having the + state.

Rules are likewise entirely customizable and users can either make use of the example rules provided, open to editing, or create their own from scratch. Users can alternate between SPE-like (default) and POSIX syntax by clicking the corresponding syntax name in the customization window. If you are converting from SPE to POSIX syntax, sub-expressions may be renumbered if any parentheses need to be added due to precedence rules (since they serve dual purposes).

How To Cite

You can cite this tool as follows:

Steel, George and Peter Jurgec (2017). Derive!: An online tool for rule derivation in phonology. Toronto: University of Toronto.

This version of Derive! was developed by George Steel and Peter Jurgec.

The following students have also contributed to the development of Derive!: