April 29, 2026 · Gökhan Oğuz

Teaching English the Turkish Way

The Turkish side of Sesla has had a sentence benchmark for a while. Hundreds of real toddler utterances tapped on the board, with the expected sentence next to each one. Every time I touch the morphology engine, I run the bench and see exactly which kid stopped sounding right. English never had that. Today it does.

295

EN Bench Cases

986

Tests Passing

7

Engine Fixes

3

Known Gaps Left

What a "real sentence" looks like in AAC

A child taps icons. The app speaks a sentence. The icons are bare lemmas: I, want, eat, cookie. The job of the engine is to turn that into "I want to eat a cookie" without sounding like a phrasebook from 1962. No POS tagger, no neural net, no 200MB language model on a kid's tablet. Just a small Dart file and a lot of curated rules.

So the bench is a list of (input tokens, expected sentence) pairs, grouped by what they exercise: simple present, third person singular, past via the marker yesterday, future via tomorrow, questions, negation, infinitive to, copula, plurals, modals. A few examples from the file:

['I', 'want', 'cookie']                  -> "I want cookie"
['he', 'eat', 'apple']                   -> "he eats apple"
['yesterday', 'I', 'go', 'park']         -> "yesterday I went to the park"
['tomorrow', 'mom', 'come']              -> "tomorrow mom will come"
['I', 'want', 'eat', 'cookie']           -> "I want to eat a cookie"
['she', 'happy']                         -> "she is happy"
['two', 'apple']                         -> "two apples"

Articles get skipped on purpose. AAC convention is to keep tiles bare and let the speech read naturally. Cookie is fine; nobody is grading the kid on whether they tapped a.

There is one place the engine quietly puts articles back, though. Movement verbs (go, come, walk, run, ride) followed by a place tile feel weird without a destination marker. Yesterday I went park sounds like a telegram. So when a known place noun follows a movement verb, the engine inserts the right glue:

['I', 'go', 'park']         -> "I go to the park"
['I', 'go', 'school']       -> "I go to school"
['I', 'go', 'home']         -> "I go home"
['I', 'go', 'bed']          -> "I go to bed"
['she', 'go', 'doctor']     -> "she goes to the doctor"

Three flavours: places that take to the (park, store, beach, bathroom, doctor, zoo), places that take a bare to (school, bed, work, church), and the home group that takes neither (home, outside, upstairs). English really is this idiosyncratic, and AAC users notice.

The negation problem and the trick from Turkish

Negation is where AAC gets interesting. The kid has no don't tile, no doesn't tile, no am not. They're not going to. Asking a four year old to construct a contraction is a bad design.

Turkish solved this elegantly: there's a single Hayır tile (the word for "no"), and tapping it at the end of a sentence flips the meaning. Ben yemek istiyorum hayır becomes Ben yemek istemiyorum. One tile, full negation, picks the right suffix on the right verb.

English deserves the same. So a trailing no on the end of any English sentence now triggers the negation rewrite. The engine looks at what's in front of it and picks the right contraction:

['I', 'want', 'eat', 'cookie', 'no']  -> "I don't want to eat a cookie"
['he', 'eat', 'apple', 'no']          -> "he doesn't eat apple"
['I', 'can', 'go', 'no']              -> "I can't go"
['I', 'will', 'go', 'no']             -> "I won't go"
['she', 'hungry', 'no']               -> "she is not hungry"

Three different rewrite paths underneath. If there's a modal, contract it. If there's a regular verb, inject do-support with the right tense and agreement. If it's just a subject and an adjective, insert is not or am not. From the kid's perspective, it's one tile that means "actually, no". From the engine's perspective, it's a small state machine.

Asking another model to grade my homework

With the bench passing, I asked a different AI model to peer review the engine. Different model, fresh eyes. The prompt was basically: "here's the file, here's the bench, find what I missed."

It came back with seven real bugs. Not nitpicks. Things like:

The fix for the first two was the same, and pleasingly small. Instead of deciding tense for the whole sentence, the engine now carries a force the next verb to stay bare flag, set whenever it sees a modal or an explicit to. The flag clears after one verb. Both bugs gone, no rewrite.

The help fix was a one line lookahead: treat it as a noun only when the next thing is the end of the sentence, a punctuation mark, the word please, or the no negation marker. Otherwise it's a verb and gets to in front.

Plurals got a tiny allowlist of bridge words (size and color adjectives, basically) that don't reset the pending plural. Apples are once again plural in the presence of red.

What didn't make the cut

Two things from the review I deliberately skipped. Can doesn't map to could in the past, because in AAC the kid almost never means the conditional, they mean ability. And the do-support questions (what do you want?) need a real subject inversion pass that's bigger than a one liner. Both logged.

A benchmark with hundreds of real sentences turns the engine from "looks right when I type it" into "actually right when a child taps it." Doubly so when the test cases come from someone other than the person writing the rules.

How the bench feels day to day

The Turkish bench has saved me from shipping at least four embarrassing regressions over the past month. The English one already paid for itself on the first peer review pass. New verb? Add a row to the bench. New tense marker? Add ten rows. Feature ships when the bench stays green.

Total damage today: 45 new test cases, 7 engine changes, one merged commit, zero regressions across the rest of the suite. The English engine is finally on the same shelf as the Turkish one. Not yet at parity. But on the same shelf.

Tomorrow

Possessives. My, your, his, her. Turkish handles these through suffix agreement and gets it right almost by accident. English needs a tiny rule for the third person and not much else, but the AAC angle is interesting: the boards have mom and my mom as separate tiles right now, and they probably shouldn't be.