Tuesday, January 26, 2010

The squawking Squaw King was stabbed in a stab bed

Yesterday I tweeted: Realized that 'assisted' is 'ass is ted'. Are there other non-compound words in English which consist entirely of other words? and people replied with is land and cut lass.

Naturally, I couldn't resist writing a small amount of code to figure out other word sequences within words. Using a short program and a 57,000 word English dictionary of common words I had the answer: 12,870 words. That means 23% of English words have this property.

Of course, many are rather boring because they are just compound words. But others are more fun:

I have secretions of secret ions and a seepage (see page 21), but sematically my sematic ally says I am fatalistic and fatal is tic bite. But the fellow fell, ow! And asked, do we seal ants with sealants? I went to the palace to see my pal (ace) and said "Serge! Ants". He called for "Sergeants!". But with an antelope the ant elope.

I smelt an aroma: the rapist! Yet, it was just the aromatherapist.

You can get the full list here.

Update Here's the code

# ----------------------------------------------------------------------------
# Small program to find words that consist entirely of other words
# concatenated. An example is 'fatalistic' which is 'fatal is tic'
# Written by John Graham-Cumming
# ----------------------------------------------------------------------------

use strict;
use warnings;

# The first argument to the program is the filename of a dictionary of
# words, this dictionary will be searched for words consisting of word
# sequences. It should be simply one word per line.
# It is loaded into the %words hash.

my $dict = $ARGV[0];
my %words;

if ( open F, "<$dict" ) {
while (<F>) {
$words{$_} = 1;
close F;
} else {
die "Cannot open dictionary file $dict\n";

# Check every word in the dictionary using the recursive function
# check_word. Note that I don't sort the words here since that might
# take a long time. Sorting can be done on the output.

foreach my $w (keys %words) {
my $sub = check_word($w);

if ( $sub ne '' ) {
print "$w ($sub)\n";

# check_word extracts ever longer subsequences of the word to be
# checked and sees if they are themselves words (by checking in
# %words). If a word is found then the remainder of the word is sent
# to a recursive call to check_word.
# For example, suppose we do check_word( fatalistic ), the code will
# check the following:
# check_word: fatalistic; found so far:
# f?
# fa?
# fat?
# check_word: alistic; found so far: fat
# a?
# check_word: listic; found so far: fat a
# l?
# li?
# lis?
# list?
# check_word: ic; found so far: fat a list
# i?
# listi?
# al?
# ali?
# alis?
# alist?
# alisti?
# fata?
# fatal?
# check_word: istic; found so far: fatal
# i?
# is?
# check_word: tic; found so far: fatal is
# This function returns an empty string if the word does not consists
# of other words, or a string containing the word broken down into
# space separated words
# e.g. check_word('fatalistic') returns ' fatal is tic'
# check_word('potato') returns ''

sub check_word
my ( $w, # The word to check
$depth ) = @_; # Contains the words found so far, or
# undefined when first called

if ( !defined( $depth ) ) {
$depth = '';
} else {
if ( defined( $words{$w} ) ) {
return "$depth $w";

for my $i (1..length($w)-1) {
my $fragment = substr($w,0,$i);
if ( defined( $words{$fragment} ) ) {
my $sub = check_word(substr($w,$i), "$depth $fragment");
if ( $sub ne '' ) {
return $sub;

return '';



Blogger Vinny Burgoo said...


Good afternoon.

7:34 PM  
Blogger Martyloo said...

Show us the code!! (Please?)

3:01 PM  
Blogger otakucode said...

potato = pot a to

Apparently your algorithm needs some work...

8:49 PM  
Blogger John Graham-Cumming said...


I'm confused: potato does appear in the list of words and I have verified that my code does produce 'pot a to' for potato.

9:12 AM  

Post a Comment

Links to this post:

Create a Link

<< Home