By Tarak

I needed some help on using grep with apply functions to match all possible substrings for every row in a column.
Say I have docQ dataframe as follows:

docQ <- data.frame(QID = c(1,2,3) , QTitle = c("This is a question with objectA objectB objectC",
                                                           "This one has stuff like objectAand objectB","Text contains objectC only ") )

Now I wish to match every QID with other QIDs for maximum possible substrings present. Probably tokenize every word and match as substring with other QIDs.

The end result should be a list split into N dataframes where N is the nth QID something like:

      StringMatches WithQID
1             2       2
2             1       3
      StringMatches WithQID
1             2       1
2             1       3
      StringMatches WithQID
1             1       1
2             1       2

