I needed some help on using grep with apply functions to match all possible substrings for every row in a column.
Say I have docQ dataframe as follows:
docQ <- data.frame(QID = c(1,2,3) , QTitle = c("This is a question with objectA objectB objectC", "This one has stuff like objectAand objectB","Text contains objectC only ") )
Now I wish to match every QID with other QIDs for maximum possible substrings present. Probably tokenize every word and match as substring with other QIDs.
The end result should be a list split into N dataframes where N is the nth QID something like:
docQID`1` StringMatches WithQID 1 2 2 2 1 3 docQID`2` StringMatches WithQID 1 2 1 2 1 3 docQID`3` StringMatches WithQID 1 1 1 2 1 2
Source: Stack Overflow