The origin of Iranian speakers is a big puzzle as in ancient times there were two quite different groups of such speakers: nomadic steppe people such as Scythians and settled farmers such as Persians and Medes.

I am guessing that the story of Iranian origins will only be solved in correlation to their Indo-Aryan brethren and their more distant Indo-European relations.

Clearly, G1 cannot be Proto-Indo-European as it has a rather limited distribution in Eurasia, but it could very well have been a marker of a subset of Indo-Europeans. If it was present in ancestral Iranians, then this would geographically constrain the places where ancestral Iranians were formed.

PLoS ONE 10(4): e0122968. doi:10.1371/journal.pone.0122968

Deep Phylogenetic Analysis of Haplogroup G1 Provides Estimates of SNP and STR Mutation Rates on the Human Y-Chromosome and Reveals Migrations of Iranic Speakers

Oleg Balanovsky et al.

Y-chromosomal haplogroup G1 is a minor component of the overall gene pool of South-West and Central Asia but reaches up to 80% frequency in some populations scattered within this area. We have genotyped the G1-defining marker M285 in 27 Eurasian populations (n= 5,346), analyzed 367 M285-positive samples using 17 Y-STRs, and sequenced ~11 Mb of the Y-chromosome in 20 of these samples to an average coverage of 67X. This allowed detailed phylogenetic reconstruction. We identified five branches, all with high geographical specificity: G1-L1323 in Kazakhs, the closely related G1-GG1 in Mongols, G1-GG265 in Armenians and its distant brother clade G1-GG162 in Bashkirs, and G1-GG362 in West Indians. The haplotype diversity, which decreased from West Iran to Central Asia, allows us to hypothesize that this rare haplogroup could have been carried by the expansion of Iranic speakers northwards to the Eurasian steppe and via founder effects became a predominant genetic component of some populations, including the Argyn tribe of the Kazakhs. The remarkable agreement between genetic and genealogical trees of Argyns allowed us to calibrate the molecular clock using a historical date (1405 AD) of the most recent common genealogical ancestor. The mutation rate for Y-chromosomal sequence data obtained was 0.78×10-9 per bp per year, falling within the range of published rates. The mutation rate for Y-chromosomal STRs was 0.0022 per locus per generation, very close to the so-called genealogical rate. The “clan-based” approach to estimating the mutation rate provides a third, middle way between direct farther-to-son comparisons and using archeologically known migrations, whose dates are subject to revision and of uncertain relationship to genetic events.

Link