This protocol presents a lexicographic analysis of a textual corpus composed of documents. In our case, we applied that method to a corpus of 41 OECD reports initially in PDF fo...