Seeking Feedback on Omni

Hello everyone! Before I begin, I want to make it clear that this thread is in no way a promotion. I’m writing this to seek your feedback, opinions, and questions to help my app grow. I strongly disagree with the promotional content that often appears on this forum, as it’s intended to be a place for discussion about apps, not for gathering votes or self-promotion.

With that said, let’s dive into the topic. I developed an AI called Omni, which is integrated into the local operating system and even has root access, though users can customize the level of access according to their preferences. I prefer the unrestricted access myself, as it allows Omni to perform virtually any task. However, for those who prefer more control, restrictions can be easily set.

Omni is capable of handling complex tasks, essentially anything and I’m not exaggerating. The initial version I demoed on YouTube used a single execution algorithm. Since then, I’ve been working on an updated algorithm that I call “multi-step context-aware execution.” This approach allows Omni to complete intricate tasks step by step. For example, it can execute a command to gather data, analyze it, and then perform further actions based on the information it has processed—essentially mimicking how humans adapt and react to new information.

One example I can share is when I asked Omni to search online for dinosaur images, create a folder with a specific name on my desktop, and then build a website using those images along with some related information using HTML and JavaScript. Not only did it complete the task seamlessly, but it also understood which image corresponded to which dinosaur, incorporating the correct images and descriptions into the website. I found this level of detail and sophistication quite impressive.

I regularly share more examples of Omni’s capabilities on its Instagram page, which I think you’ll find both entertaining and insightful.

Now, I’d love to hear your thoughts on the app, as well as any questions that come to mind when you think about this type of technology. Omni has many other features that I’m not covering here, but I’m curious to know what additional features you’d like to see that could make it even more extraordinary.

I would appreciate any feedback, thanks a lot!

4 Likes

When I’m back from vacay I’ll check it out more in depth

1 Like

I actually have some things to say:

  1. You killed with this app. I think Omni and Jayu (for another dev) are gonna be neck to neck on the usefulness category.

  2. I have a very different app, but i also noted that we have the same problem: latency. One thing that i think will (vastly) improve the user experience in Omni is the latency (or at least the perceived latency). In this topic i suggest (if possible):

  • Don’t make URL/API calls when you don’t need them. To make a folder you could (maybe) try some regex or even a small local llm to avoid the round trip to server.
  • Instead of waiting the speech and or text, stream the response and take action with partial information (if possible)
  • With the same idea, if possible, stream the speech to text also. Don’t wait until the user stops talking to process all the information
  • Distract the user with something while your app is doing background tasks (some loading animation or even some status like “Opening firefox… opening instagram…”
  • Answer something instantly after the user command like “Alright, im already working on that”. This will make tasks that took 3 seconds looks like took 2 (because the awkwardness is in the silence without doing nothing/saying nothing)
  • Consider using parallel processing and race conditions for tasks that are not intrinsicaly related for example: “search online for dinosaur images (parallel process 1) create a folder with a specific name on my desktop (parallel process 2), and then build a website using those images along with some related information using HTML and JavaScript (wait for 1 and 2 and then do this task)”
  1. Not a feedback but questions (out of sheer curiosity): I saw that you plan to release for Mac. Do you already have a pricing or at least have a ball park about it? Do you plan to release to Windows?
5 Likes

Your feedback and suggestions are highly valued and exactly what I needed to hear.

As for API calls, the AI typically makes just one call for straightforward tasks. However, if a task is more complex, it will create a plan and execute multiple calls as needed, but there’s already a limit in place for this.The idea of streaming speech-to-text is fantastic, and I hadn’t considered that, along with the other UI improvements you suggested. In terms of parallel processing, the code does utilize it to some extent, along with asynchronous operations. However, it can’t handle tasks that require multiple interdependent steps all at once, as some tasks depend on the completion of earlier steps before moving forward.

Many new features will be added to Omni, and I’m still actively developing it. For instance, you’ll be able to track processes running in the background and monitor their output, source code, CPU usage, memory usage, and more, all within the app, functioning like a tracking system. This will be especially useful for tasks that require long execution times, sometimes lasting hours. Additionally, other features will be introduced as well. Code optimization will also be a feature. You can provide a specific piece of code, and it will iterate over it repeatedly, focusing primarily on mathematical algorithms to reduce time and memory complexity. It will test each version of the code to ensure that the modified version performs the same tasks as the original but more quickly and efficiently.

In response to your question, I’m planning to release it to the public in March. However, since I’m currently coding it on my own, there might be some unexpected delays. I’ve applied to Y Combinator to seek investors, and I should hear back from them in a few weeks, which could significantly impact the timeline. The March release will only be for macOS. I understand that using Flutter would provide more adaptability across platforms, but when I work on a project, I prefer writing it in the native language for each operating system. This means it will need to be completely redesigned for Windows and Linux later on. Although, I still have a lot of things to do to make it ready for MacOS, you don’t see it on the videos but I have been coding a lot to make sure of security and privacy with Omni in the backend.

The app will likely follow a subscription model, with an affordable price range of around $5-10 per month. It will offer various options, including LLMs like Claude and GPT, as well as pre-trained open-source models like Llama 3.1 in different versions (if they have enough space and computational power haha). This way, users won’t have to pay for the app and then pay additional costs for API services, which can be frustrating.

edit: Jayu is definitely a cool project, and I love the presentation and the way it’s presented is def better than mine but the way our backends work is totally different. From what I understand, most of Jayu’s features are pre-coded and fixed. With Omni, though, if you look at the backend, there’s nothing tied to specific commands, it’s all powered by various algorithms, including ones that self-correct. None of Omni’s features are pre-coded; everything is generated by the LLM itself, which I focused on to keep it as flexible as possible. One big thing people often miss is that Omni’s real game-changer is its root access to the laptop, allowing it to do things that non-root access just can’t. And trust me, getting an app to run root commands in the background on macOS isn’t easy lmao. Also, I spent so much time on Omni’s frontend, as you can see it’s very beautiful with many animations, even the logo of Omni is coded using some trigonometry lmao.

Thanks again for your feedback!

1 Like

I had imagined that some of my suggestions you already had noted or at least had in mind (devs always have backlogs with improvements for the future).

I don’t see a problem in writing in the native language.
You will probably deliver a better code quality and overall experience.
The only problem is the effort to make new apps for Windows and Linux (and more important: maintain after release)

One suggestion here is to make a core app that is agnostic (or at least is exactly the same, you only have to compile again for every platform) and only fork in the user interface.
This will also probably avoid anomalous core behaviors in each version.
Or just “Flutterify”. Will probably work also.

As the business model/pricing, looks like a very competitive pricing.
And VC’s love monthly recurrent revenue.

PS: I just mentioned Jayu because, as an outside bystander, looked similar. I dont know how it works deep down to have an opinion but i also do prefer the more generic (and not hard coded) approach. And i think that root is specially important on OSX since you cant change a folder’s color without the system asking for your permission nowadays. Up for privacy and security / Up for annoyances.

PS2: On a completely different topic, if you have curiosity check this guy work: https://www.youtube.com/@SebastianLague. Its nice to see how good he is at math/coding and how he delivers on the videos. I always learn something new there and always suggest to people.

Nevertheless,
Good luck with the Y Combinator application!

3 Likes

These advice apply to other projects as well! Use streaming and other techniques to improve latency and responsiveness · Issue #51 · CsabaConsulting/InspectorGadgetApp · GitHub

2 Likes

So I looked at all the videos on your gram. Very cool product it’s like a better version of pywinassitant. Curious why you used “activate” as your key word instead of “Omni”. And yeah as earlier people have mentioned, distracting the user with some cool animation or wording would be nice. As for the GUI it looks like chatgpt in terms of branding, maybe make it a small widget that shows when it’s active/working/complete versus showing the entire chat. If someone cares about the chat to see what went wrong they’ll go and look at it, I assume your goal is to make it more seamless, so that’s my suggestion for that.

1 Like

Thanks! The user can set their preferred activation or deactivation words for the speech recognizer in the settings, so you can choose whatever suits you. It took around 1,000 lines of Swift code to ensure the speech recognition is smooth and seamless, allowing for conversation without needing to manually interact with anything.

1 Like

It seems like Anthropic was also working on a similar AI that uses the computer:
https://www.youtube.com/watch?v=vH2f7cjXjKI

Yeah, this is the reason why altho the idea of having an AI controlling the OS is very cool, at the same time from a business perspective does not make a lot of sense to me, because Microsoft and Apple will 100% develop OS level integration for AI, so unless you make your version of it sooooo good that they will buy you for the tech or just to kill competition you have no future

1 Like

BTW, before somebody writes a 2 pages long essay about why I’m wrong. Im not looking for a discussion/confrontation, I’m just sharing my view of the market (I know that I could be wrong)

Yeah, but it was quite basic, I have no worries about it.

It was inevitable. Whether users feel like giving 3rd parties permission to control their computer remains a different privacy worry - especially for someone like me.

What’s truly impressive is that @ehsa_293 built Omni all by himself and before any big AI model companies did. Love the idea and direction AI is taking us to.

3 Likes

What is more impressive is that Judges will not listen to your story and will evaluate on their criteria. I saw that you keep pushing that Omni is the best app submitted for this competition. I saw 2-3 apps that were similar to it. That means you are using any third-party library for that. You also have 18K views on one of your videos that looks fake. Please stop telling the judges that you are the best,

2 Likes

Mate, why don’t you focus on your entry and leave everyone else alone? You’re not the judge, so who’s fake and who’s not isn’t your business. We don’t even know Google’s actual criteria or how they’ll see it from their perspective.

4 Likes

I submitted my entry and now I am just waiting. I do not need to be focused and tell everybody that I am the best. Secondly, this speech recognition technology is not new. In old windows, there was this technology that you can use you control. Maybe some of you remember those characters.

1 Like

If you’re genuinely concerned about how judges perceive things, consider what they’ll think after reading your insulting comment. No one here was being rude until you entered the conversation. First, no one has claimed that my project is the best, so don’t put words in people’s mouths. Secondly, @luluthepooh has provided interest for numerous other projects and has been one of the most helpful contributors to this forum. Your baseless accusation against him is a low move. Thirdly, I didn’t use any third-party tools for voice recognition; I manually coded it myself, thousands of lines of code, in fact. I believe in quality and that’s why I coded it all in the native language of MacOS swift which makes it way more complicated to do. So, before you start imagining things like people claiming their projects are superior, please be respectful and civil in your interactions. My post was a about asking for feedback not randomly trying to insult it for no reason.

3 Likes

I did not insult anybody. Everybody put in their efforts, you are not alone here. You are trying to influence judges. Let them decide. I even do not know, if you are from the same country/group or randomly supporting each other.

1 Like

First of all, you’re wrong if you think judges will be influenced by conversations on this forum. At this point, you’re suggesting that judges can be corrupted by online discussions? Ludicrous.

Secondly, don’t be so insecure about yourself or your submission. Just because nobody talked about your submission on this forum doesn’t mean it doesn’t have a chance to win.

Third, just because someone says something good about someone else’s submission doesn’t mean they are trying to get that person to win. Think about it logically, what does one competitor gain from supporting another competitor? Maybe they are just being nice?

Lastly, we are all individual competitors. We only gain something if our own submissions win, not anybody else’s. Anyone can have their own favorites and talk about it. Anyone can have good or bad opinions or generic discussion about someone else’s project.

We are talking about Omni now again because Anthropic (Claude) released something similar to what Omni does i.e. AI controlling your computer. Your response was off-topic, maybe try reading more about it before replying?

6 Likes

Btw i think you’re absolutely right. When chatGPT first came out, I coded a whole app where if you took a photo of something, it would use Google’s & Microsoft image analysis APIs then pass it off to GPT APIs so you could chat with it. I had fun building it but I knew it wasn’t giving “wow” factor.

Needless to say, within a couple months, chatGPT released image capabilities and way better analysis then so did other apps.

I think any startup that is trying to do “everything” apps with AI models won’t have a chance against the LLM model creators. If it feels like it’s something that they will create (or add to their apps), they most definitely will.

Kinda reminds me of the story of how Steve Jobs tried to acquire Dropbox for little money, otherwise they would be building an Apple-only competitor - iCloud.

4 Likes