Pre-processing of a textual corpus for a lexicographic analysis with Iramuteq