Cambridge University experts have published information about a dangerous vulnerability (CVE-2021-42574) that affects almost all modern source code compilers. The Trojan Source article describes an insidious attack that allows hackers to hide malicious code in the source code of various programs.
The attack relies on the way compilers handle the unique identifiers used to determine whether the text is oriented left to right or right to left. The weakness lies in the Unicode Bidi algorithm, which allows words written from right to left and from left to right to be used together. Thanks to this algorithm, Arabic and English words can be combined. It makes it possible to read the text written from right to left, from left to right and vice versa.
In some cases the abilities of the Unicode Bidi algorithm are insufficient to change the way these words are displayed, and in such cases special control characters are used. However, if one line combines words with different text direction, it is possible to use these control characters to change the direction in which the compiler reads this text and, for example, make lines that look like comments work as executable code.
Using this method, you can add a malicious instruction to normal source code and make the text of that instruction invisible when viewing the code with a subsequent comment. This will result in the insertion of completely different characters, which could actually be arbitrary code. The final source code remains semantically correct, but the opposite happens after compilation.
Source: trojansource, zdnet