Concerns regarding the randomness shuffling algorithm used to shuffle projects

ehsa_293 · September 4, 2024, 11:55pm

Hi there @Lloyd_Hightower ,
I was curious on how the shuffling algorithm works so I tried to see if i can find the implementation in the js files and I found this line:

x3 = function(a) {
return a.jb.sort(() => .5 - Math.random()).slice(0, 3)
}
This function does the following:

It takes an array a.jb, which I assume contains all the projects?
It then uses the sort() method with a comparison function that returns a random value between -0.5 and 0.5: () => .5 - Math.random()
After shuffling, it uses slice(0, 3) to select the first 3 items from the shuffled array.

And so I was curious to check statistically if it is “fair” or “unfair” by doing a data distribution analysis, so I used an iteration of over 100k with the same exact algorithm over 3046 projects, 3 shown at every shuffle. These were the results of my findings:

Total Projects: 3046

Projects Selected per Run: 3

Basic Statistics:

Expected Selections per Project: 9.85

Mean Selection Rate: 0.000985

Standard Deviation: 0.000574

Minimum Selection Rate: 0.000000

Median Selection Rate: 0.000900

Maximum Selection Rate: 0.004700

Fairness Metrics:

Coefficient of Variation: 0.583104

Gini Coefficient: 0.299487

Kolmogorov-Smirnov Statistic: 0.097298

The Coefficient of Variation (CV) of 0.583104 indicates high variability in project selection rates, showing the process isn’t uniform. The Gini Coefficient of 0.299487 reveals moderate inequality in selection frequency, suggesting some projects are chosen more often than others. The Kolmogorov-Smirnov (KS) test statistic of 0.097298 confirms the selection distribution significantly deviates from uniformity (p>0.05), providing strong evidence of non-random selection. The standard deviation (0.000574) being high relative to the mean also indicates high variability. The range of selection rates, from 0 to 0.004700, shows some projects are heavily favored while others are never selected. Collectively, these metrics show that it’s definitely not a very uniformed shuffling algorithm.

This isn’t probably a very big deal, but if this algorithm is actually being used to shuffle the projects, it wouldn’t be doing a great job.

Thanks for reading!

edit: I designed a very uniform, and statistically sound algorithm for the shuffling, I am just gonna leave it here, who knows it might help or not:

class UniformProjectSelector {
  constructor(totalProjects, selectCount) {
    this.totalProjects = totalProjects;
    this.selectCount = selectCount;
    this.projectPool = this.initializeProjectPool();
    this.cycleCount = 0;
  }

  initializeProjectPool() {
    return Array.from({ length: this.totalProjects }, (_, i) => i + 1);
  }

  shuffleArray(array) {
    for (let i = array.length - 1; i > 0; i--) {
      const j = Math.floor(Math.random() * (i + 1));
      [array[i], array[j]] = [array[j], array[i]];
    }
  }

  selectProjects() {
    if (this.projectPool.length < this.selectCount) {
      this.cycleCount++;
      const remainingProjects = this.projectPool.slice();
      this.projectPool = this.initializeProjectPool();
      this.shuffleArray(this.projectPool);
      return [...remainingProjects, ...this.projectPool.slice(0, this.selectCount - remainingProjects.length)];
    }

    const selectedProjects = this.projectPool.slice(0, this.selectCount);
    this.projectPool = this.projectPool.slice(this.selectCount);
    return selectedProjects;
  }

  getStats() {
    return {
      totalProjects: this.totalProjects,
      selectCount: this.selectCount,
      remainingInPool: this.projectPool.length,
      completeCycles: this.cycleCount
    };
  }
}

And here are the statistics on the new algorithm:

Final stats: {

totalProjects: 3046,

selectCount: 3,

remainingInPool: 478,

completeCycles: 9

}

Uniformity Test Results:

Chi-squared statistic: 42.697999999999155

Degrees of freedom: 3045

Expected selections per project: 9.848982271831911

Actual min selections: 9

Actual max selections: 11

The chi-squared statistic (42.698) is actually lower than the degrees of freedom (3045), which is exceptional. This shows basically that the distribution is even more uniform than we would expect by chance.
The difference between the minimum (9) and maximum (11) selections is only 2, which is remarkably small given the large number of projects and selections. This basically shows that no project is significantly over- or under-represented.
The completion of 9 full cycles ensures that every project has been selected at least 9 times, with some selected 10 or 11 times due to the current incomplete cycle.
The expected selections (9.849) closely match the actual range (9-11), which confirms the algorithm’s fairness.

luluthepooh · September 5, 2024, 12:09am

Viewing it from a mobile device ONLY shows 1 at a time. Viewing it from a tablet device ONLY shows 2 at a time.

In any case, it would be helpful for you to suggest the randomization algorithm change for Lloyd to consider as well.

ehsa_293 · September 5, 2024, 12:12am

Yes, still not sure what’s going on in the backend, this algorithm is being used on the client side, I assumed most people would shuffle on the laptop that’s why I considered 3 projects at once, but even with 2 or 1, it won’t change the results much. I can indeed work on a considerably more fair randomness algorithm, it’s not something too hard, but I didn’t think it was a critical problem, also needed more affirmation that this is the actual algorithm being used for the shuffling.

Felippe_Gallo · September 5, 2024, 12:57am

Altough you are probably right about the math (i’m just trusting you here), here is my 2 cents on this topic:

There is 3046 projects to be seen and judged to a considerable cash amount
The people’s choice award is a symbolic prize with no retail value
Probably the majority of the votes will be done by sharing the specific link of the project with family/friends/followers

What Im saying is, based on the effort to do all the math, tests and think about the shuffle problem, they probably just needed to be “fair enough” (and “enough” in this case is just give a pseudo random chance to people see all the projects)

You will always compromise something when you have to balance a timeline, budget, effort and pressure from the community.

PS: I just saw your edit and i always consider a real MVP move to actually suggest a solution than just state a problem. Nice work there

ehsa_293 · September 5, 2024, 1:04am

Yeah, I understand that, I already said it’s really not a big issue at all, but I was curious to check how it works, besides it doesn’t do any harm to report it. It didn’t take me a long time to do all the math so it’s all good.

And thanks for the compliment, I originally checked the codes to look for any vulnerabilities that allow someone to vote multiple times for the same project so I could report that and to make sure it’s secure and also to see if the voting is based on the google account, ip address, etc.

It was mostly curiosity.

Topic		Replies	Views
Need for a fairer voting page Gemini API Developer Competition	6	270	September 6, 2024
Count the Shuffle Button Clicks - People's Choice Award - Gemini API Developer Competition Gemini API Developer Competition	0	74	September 11, 2024
How is Shuffle random General Discussion tfdata	2	62	July 9, 2024
Level of Projects Being Developed Gemini API Developer Competition	4	457	June 9, 2024
I remade the competition website to show all submitted apps, and on hover, you can play the submitted video Gemini API Developer Competition	8	293	September 7, 2024

Concerns regarding the randomness shuffling algorithm used to shuffle projects

Related topics