String Finding

View as PDF

Submit solution

All submissions

Best submissions

Points: 5

Time limit: 1.0s

Memory limit: 64M

Author:

Kirito

Problem type

You have a string (indexed from $0$ $0$ ) with no more than $10^6$ $10^{6}$ lowercase characters. Find the first occurrence of a string $T$ $T$ $(1 \le |T| \le |S| \le 10^6)$ $(1 \leq | T | \leq | S | \leq 10^{6})$ , or print -1 if $T$ $T$ is not a substring of $S$ $S$ .

Input Specification

The first line will have the string $S$ $S$ .
The second line will have the string $T$ $T$ .

Output Specification

Print the index of the first occurrence of the string $T$ $T$ in $S$ $S$ , or -1 if it is not a substring of $S$ $S$ .

Sample Input

Copy

higuyshowsitgoinghiguys
higuys

Sample Output

Copy

Comments

0

pattti commented on Aug. 6, 2023, 1:38 a.m.

yo how u pass tle using java

2

Kirito commented on Aug. 6, 2023, 5:38 a.m.

The point value is admittedly slightly misleading thanks to the problem being very trivial in certain languages (e.g. C/++, Python 3).

You can either try making me sad by checking if Java has some built-in linear time string matching function, or implement KMP, Rabin-Karp, Z-algorithm, or any number of other linear-time string matching algorithms.

2

aeternalis1 commented on July 7, 2017, 1:57 p.m. ← edited →

Is it possible to create a solution for test case # 15 using python 3? I can't seem to run it under the time limit.

5

Kirito commented on July 7, 2017, 3:28 p.m. ← edited →

Python's builtin find uses Boyer-Moore, which has a worst-case runtime of $\mathcal O(|S| \times |T|)$ $O (| S | \times | T |)$ . Rabin-Karp and KMP are two algorithms which have a worst-case runtime of $\mathcal O(|S| + |T|)$ $O (| S | + | T |)$ .

0

TechnobladeNeverDies commented on Sept. 18, 2022, 1:39 p.m.

This is incorrect; Boyer-Moore runs in linear time when the pattern does not occur in the text, which is not the case for Python.

Kirito commented on Sept. 20, 2022, 3:22 p.m. ← edit 2 →

To be pedantic: it's described as "a simplication of Boyer-Moore, incorporating ideas from Horspool and Sunday". The $\mathcal O(|S||T|)$ $O (| S | | T |)$ worst-case bound still holds. Whether you want to consider this "using Boyer-Moore" or not is up to you.

If you want to get more specific, the CPython implementation is described in their code as

Copy

/* fast search/count implementation, based on a mix between boyer-
   moore and horspool, with a few more bells and whistles on the top.
   for some more background, see:
   https://web.archive.org/web/20201107074620/http://effbot.org/zone/stringlib.htm */

/* note: fastsearch may access s[n], which isn't a problem when using
   Python's ordinary string types, but may cause problems if you're
   using this code in other contexts.  also, the count mode returns -1
   if there cannot possibly be a match in the target string, and 0 if
   it has actually checked for matches, but didn't find any.  callers
   beware! */

/* If the strings are long enough, use Crochemore and Perrin's Two-Way
   algorithm, which has worst-case O(n) runtime and best-case O(n/k).
   Also compute a table of shifts to achieve O(n/k) in more cases,
   and often (data dependent) deduce larger shifts than pure C&P can
   deduce. See stringlib_find_two_way_notes.txt in this folder for a
   detailed explanation. */

Source: https://github.com/python/cpython/blob/main/Objects/stringlib/fastsearch.h

A challenge for anyone who's sufficiently bored would be figuring out if it is possible to force CPython to switch to the Two-Way algorithm on the given test data.