Unofficial competition results

If you are bored this weekend here’s an interesting small project :wink:

Here me out, what if, from the JSON file you iterate each project and scrape it’s webpage (please add a sensitive delay between calls so you don’t stress Google’s servers), from there you take the title and the description, and using the Youtube’s API you also get the videos duration (I think @ehsa_293 already did this).

From that data scrap videos shorter than 30 seconds (those longer than 3 minutes are not technically disqualified tho, BUT I will also scrap those over 5 minutes).

Now here’s an interesting extra step you could also do, from each video extract the audio a pass it tru a speech to text AI and get that data too. At the end you should have a CSV file with:

title | shortDescription | description | category | videoTranscription

Now comes the fun part, take the competitions rules (the whole thing) and put that into a pdf/doc file, and feed that and the CSV to the 3 main LLMs; Gemini, ChatGPT and Claude, and with a clever prompt asking them to act as judges based on the rules files to choose the winners.

I think it will be super interesting for us to discover cool projects among the huge amount there is in the list and also to compare the results between the LLMs and between them and the actual official results!

4 Likes

I wouldn’t recommend this, as it’d introduce a sort of Bias. a validator is definitely useful, however going towards direct judging even if generated will introduce unwanted Bias, which is against the randomization Google is going for.

The cost of AI to do this would be substantial. Especially since there is a video that is 10 hours + :rofl: :skull:

2 Likes

yeah thats why I added it as an extra step, it will be around 150USD according to my estimations using the Google’s speech to text API, BUT, there are very good local open source speech to text projects, you could use them to make this essentially free.

This have nothing to do with the popularity contest, it is just a small investigation project to see what comes up

Actually YouTube has an API for captions data. So don’t even need to use Text to Speech API.

1 Like

I was going to do that a few days ago, you also have to consider a rating system and manage the ratings and eliminations properly, mostly some math, but then I realized I’m already 500 dollars in debt to Gemini api lmao. Although, I was going to make it choose three potential winners for each category and then compare it to the judge’s choices. Also, I didn’t use any Youtube API to get the durations, I showed the code how I did it, found the specific class name responsible for the length of the video in the html file and the rest done with selenium.

2 Likes

What’s another 100 bucks in debt? :rofl:

1 Like

Next would be homelessness lmao.

1 Like

I believe there is a set of cli tools that will parse that transcript from a YouTube videos. Might be Fabric but I’m on vacation right now and don’t have the mind for it

Yeah, pretty easy to do, it would just cost a lot.

Just noticed now that I can do it for free without using any APIs, only tampermonkey, it would just take about 5 hrs for it to finish (5 secs per entry)

1 Like

But are you using an LLM as a mock judge?

Yeah, he would use open source LLM like llama prob

I could, but I don’t really feel like getting all transcripts, and even if I did I wouldn’t judge anything, I would just gather data and give it out.

It’s useless since most videos (as Google itself recommended) shouldn’t have a lot of speech in it as you should rather show than describe your product.

1 Like

True. I remember the video explaining how to submit video was mentioning “less talking, more demoing”.

Yeah exactly, that’s what I did for my video

Well I have a prototype of this built. Only ran like 5 but it’s pretty cool.

I made it

https://bowtrum-1337.web.app/login

1 Like