Editorial for ICPC NEERC 2018 J - JS Minification


Remember to use this editorial only when stuck, and not to copy-paste code from it. Please be respectful to the problem author and editorialist.
Submitting an official solution before solving the problem yourself is a bannable offence.

The first part of the problem is to write code that parses the input. Process the input source line by line. On each line, you should repeatedly skip space and parse tokens. The start of a potential word or a number is determined by its first character according to the problem statement, parsing the longest possible word or number while the corresponding characters are encountered. However, there could a case when there is a longer reserved token that had started at the same position. In the limits of this problem, all n possible reserved tokens can be exhaustively checked at the current parsing position. The longest matching reserved token or the longest word/number token shall be chosen. Thus, the whole input is transformed into a sequence of tokens.

The second part of the problem is word renaming. Maintain a map from source word to output word and use renaming from this map when a source word is encountered for a second time. To handle a new source word you need a procedure to generate a target word list. This can be done by keeping the value of the last used target word and using a procedure to generate the next target word from the previous one, which is basically like incrementing an integer in base 26. You will need to check each candidate target word against a list of reserved tokens, and repeat looking at the next target word until it is not in the list of reserved tokens.

The third part of the problem is writing the resulting sequence of tokens (after rename) to the output while inserting the minimum number of spaces. This can be done greedily, because a space is a universal separator and cannot be part of any token. Write tokens to the output and check if a space must be inserted before a token to prevent erroneous parsing of the resulting string. To streamline this check, you should keep a list of all the tokens that were output after the last space (or after the beginning of the source).

Note, that if a previous token conforms to a number token rule (regardless of whether it matches some reserved token or not), then adding a token that starts with a digit affects the parsing, because the previous token could have been parsed to a longer number token. Thus, a space must be inserted in this case.

Similarly note, that if a previous token conforms to a word token rule (even if it matches a reserved token), then adding a token that starts with a letter, digit, underscore or dollar sign would similarly affect the parsing. Thus, a space must be inserted in this case, too.

Now, the remaining case of parsing confusion comes from reserved tokens. This can happen at the beginning of any of the previous tokens. To detect this case, scan the list of the previous tokens (since the last output space) starting from the most recently output token backwards and check if parsing from that position could result in recognition of a reserved token because of the new token you are about to output without a space. There is a limit of 20 characters on the length of a reserved token, so this backward iteration can be terminated after going more than 20 characters backwards in the output. This avoids \mathcal O(k^2) time, where k is the number of tokens in the input source.

There are different working approaches to this check. In particular, you can avoid separate checks for words and numbers and just do a straightforward attempt to parse a token from each test position, assuming that you output a new token without a space. If that check shows that a different token can get parsed, then a space must be output before the token you are about to write. Don't forget to clear a list of recently output tokens as you write a space to the output.


Comments

There are no comments at the moment.