Editorial for IOI '95 P4 - Letter Game


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

The obvious problem in this exercise is the large "dictionary" file. It's too big to read into memory (unless you know how to use the machine's extra memory). Since you have to output all the words and combinations with the highest score, the entire file has to be read at least once. This extra requirement was in fact a small hint to help you find the highest scoring word when no exact match can be found. As reading a large file takes quite some time, the obvious strategy is to avoid reading the file more than once. The purpose of the exercise was not, as some solutions seemed to suggest, to find a way to cram all the words in memory, nor was it an exercise in making indexes or using hashing procedures. The fact that the word list could vary from run to run was meant as a hint that elaborate pre-processing of the list should be avoided as it would be used only once. I shudder to think what could have happened if the list we offered would not have been sorted. Not that it mattered for this problem, but probably people would have felt the need to sort it (exceeding the time limit). This time limit was also a hint to put you on the right track, i.e., it would be a waste of valuable time to process the word list more than once. However, some of you were so obsessed by it that you deemed it necessary to put a timer in the program, which, in some cases, led to disaster.

After seeing some of the solutions you came up with, I was quite amazed to see how some of you can turn a perfectly simple problem (remember, it was the first program on the second day, which tells you something about its complexity, or more accurately lack of complexity) into extremely complex programming. Still there remain several ways to find the highest scoring word(s) or pair(s) of words given a set of 3 to 7 letters, but there are certain inherent conditions you can use to avoid extra work.

  • With an exact match (i.e. the letters in the wordlist-word match the letters given), you will always score maximum points. Once you have found 1 exact match, you do not have to look at words with lesser characters unless the number of letters given is 7 (then you have to keep track of 3 and 4 letter words) or 6 (then you have to keep track of 3 letter words).

  • It's a waste of time and space to store all 3 and 4 letter words. Only when necessary (see 1) you should store those 3 and 4 letter words that match with the letters given. These words can be stored in arrays. The number of unique combinations of 3 and 4 letters given a set of 7 letters is 35 (for 3) and 35 (for 4). The words are not unique so reserving 3 \times 2 \times 35 and 4 \times 3 \times 2 \times 35 will cover all possibilities amply (in fact this is far too much as they are actual words, and most letter permutations do not yield grammatical words). After reading all the words in the wordlist, you can easily check for combinations. You do not have to count their scores separately as they are exact matches and therefore score maximum points.

  • The fun starts when no exact match can be found. When 6 or 7 letters were given, finding a combination of 3+3 or 4+3 will again give maximum score, but you can't know that a combination will be found, so you have to keep track of words of 5 letters as well.

  • When no exact match can be found and no combinations can be found, a word with fewer letters can score more points than a word with more letters.

  • When no exact match can be found, words with an equal number of letters should be checked for the highest score.

Possible solution:

The way you check is crucial. An alphabetic check will mean extra time as each word has to be sorted and it only works with exact matches. Generating all possible combinations from the set of letters and matching against these takes less time and will ensure that all possible words are found, but it means that lacking an exact match, each word read from the word list has to be compared with at least 35 and at most 95 combinations. A one-on-one check should therefore be preferred. Eliminating the letters of the word read against those in the set of letters seems to work best. Storing the highest scoring word(s) so far, while processing the wordlist, will solve 3, 4 and 5 from the list above. That way, you will always have a solution (at least when a word can be found with the letters given).

How to speed up?

  • read the word list only once
  • use string operations when comparing
  • use recursion when eliminating
  • use dynamic data structures to store possible candidates and to store 3 and 4 letter words. That way, you do not have to initialise arrays and you can sort on value while storing and you do not have to figure out, beforehand, how much memory you have to reserve for the arrays.
  • do not check words unnecessarily (see above).

Connie Veugen
Scientific Committee IOI'95


Comments

There are no comments at the moment.