standardize differences that do not affect the text content, such as the above-mentioned accent representation