Abstract Lung squamous cell carcinoma (LUSC) causes approximately 400 000 deaths each year worldwide. The occurrence of LUSC is attributed to exposure to cigarette smoke, which induces the development of numerous genomic abnormalities. However, few studies have investigated the genomic variations that occur only in normal tissues that have been similarly exposed to tobacco smoke as tumor tissues. In this study, we sequenced the whole genomes of three normal lung tissue samples and their paired adjacent squamous cell carcinomas. We then called genomic variations specific to the normal lung tissues through filtering the genomic sequence of the normal lung tissues against that of the paired tumors, the reference human genome, the dbSNP138 common germline variants, and the variations derived from sequencing artifacts. To expand these observations, the whole exome sequences of 478 counterpart normal controls (CNCs) and paired LUSCs of The Cancer Genome Atlas (TCGA) dataset were analyzed. Up to 16 genomic variations were called in the three normal lung tissues. These variations were confirmed by Sanger capillary sequencing. A mean of 0.5661 exonic variations/Mb and 7.7887 altered genes per sample were identified in the CNC genome sequences of TCGA. In these CNCs, C:G→T:A transitions, which are the genomic signatures of tobacco carcinogen N-methyl-N-nitro-N-nitrosoguanidine, were the predominant nucleotide changes. Approximately 25 genes in CNCs had a variation rate that exceeded 2%, including ARSD (18.62%), MUC4 (8.79%), and RBMX (7.11%). CNC
variations in CTAGE5 and USP17L7 were associated with the poor prognosis of patients 3 with LUSC. Our results uncovered previously unreported genomic variations in CNCs, rather than LUSCs, that may be involved in the development of LUSC.