Seeking Feedback on Omni

Hello everyone! Before I begin, I want to make it clear that this thread is in no way a promotion. I’m writing this to seek your feedback, opinions, and questions to help my app grow. I strongly disagree with the promotional content that often appears on this forum, as it’s intended to be a place for discussion about apps, not for gathering votes or self-promotion.

With that said, let’s dive into the topic. I developed an AI called Omni, which is integrated into the local operating system and even has root access, though users can customize the level of access according to their preferences. I prefer the unrestricted access myself, as it allows Omni to perform virtually any task. However, for those who prefer more control, restrictions can be easily set.

Omni is capable of handling complex tasks, essentially anything and I’m not exaggerating. The initial version I demoed on YouTube used a single execution algorithm. Since then, I’ve been working on an updated algorithm that I call “multi-step context-aware execution.” This approach allows Omni to complete intricate tasks step by step. For example, it can execute a command to gather data, analyze it, and then perform further actions based on the information it has processed—essentially mimicking how humans adapt and react to new information.

One example I can share is when I asked Omni to search online for dinosaur images, create a folder with a specific name on my desktop, and then build a website using those images along with some related information using HTML and JavaScript. Not only did it complete the task seamlessly, but it also understood which image corresponded to which dinosaur, incorporating the correct images and descriptions into the website. I found this level of detail and sophistication quite impressive.

I regularly share more examples of Omni’s capabilities on its Instagram page, which I think you’ll find both entertaining and insightful.

Now, I’d love to hear your thoughts on the app, as well as any questions that come to mind when you think about this type of technology. Omni has many other features that I’m not covering here, but I’m curious to know what additional features you’d like to see that could make it even more extraordinary.

I would appreciate any feedback, thanks a lot!

3 Likes

When I’m back from vacay I’ll check it out more in depth

1 Like

I actually have some things to say:

  1. You killed with this app. I think Omni and Jayu (for another dev) are gonna be neck to neck on the usefulness category.

  2. I have a very different app, but i also noted that we have the same problem: latency. One thing that i think will (vastly) improve the user experience in Omni is the latency (or at least the perceived latency). In this topic i suggest (if possible):

  • Don’t make URL/API calls when you don’t need them. To make a folder you could (maybe) try some regex or even a small local llm to avoid the round trip to server.
  • Instead of waiting the speech and or text, stream the response and take action with partial information (if possible)
  • With the same idea, if possible, stream the speech to text also. Don’t wait until the user stops talking to process all the information
  • Distract the user with something while your app is doing background tasks (some loading animation or even some status like “Opening firefox… opening instagram…”
  • Answer something instantly after the user command like “Alright, im already working on that”. This will make tasks that took 3 seconds looks like took 2 (because the awkwardness is in the silence without doing nothing/saying nothing)
  • Consider using parallel processing and race conditions for tasks that are not intrinsicaly related for example: “search online for dinosaur images (parallel process 1) create a folder with a specific name on my desktop (parallel process 2), and then build a website using those images along with some related information using HTML and JavaScript (wait for 1 and 2 and then do this task)”
  1. Not a feedback but questions (out of sheer curiosity): I saw that you plan to release for Mac. Do you already have a pricing or at least have a ball park about it? Do you plan to release to Windows?
4 Likes

Your feedback and suggestions are highly valued and exactly what I needed to hear.

As for API calls, the AI typically makes just one call for straightforward tasks. However, if a task is more complex, it will create a plan and execute multiple calls as needed, but there’s already a limit in place for this.The idea of streaming speech-to-text is fantastic, and I hadn’t considered that, along with the other UI improvements you suggested. In terms of parallel processing, the code does utilize it to some extent, along with asynchronous operations. However, it can’t handle tasks that require multiple interdependent steps all at once, as some tasks depend on the completion of earlier steps before moving forward.

Many new features will be added to Omni, and I’m still actively developing it. For instance, you’ll be able to track processes running in the background and monitor their output, source code, CPU usage, memory usage, and more, all within the app, functioning like a tracking system. This will be especially useful for tasks that require long execution times, sometimes lasting hours. Additionally, other features will be introduced as well. Code optimization will also be a feature. You can provide a specific piece of code, and it will iterate over it repeatedly, focusing primarily on mathematical algorithms to reduce time and memory complexity. It will test each version of the code to ensure that the modified version performs the same tasks as the original but more quickly and efficiently.

In response to your question, I’m planning to release it to the public in March. However, since I’m currently coding it on my own, there might be some unexpected delays. I’ve applied to Y Combinator to seek investors, and I should hear back from them in a few weeks, which could significantly impact the timeline. The March release will only be for macOS. I understand that using Flutter would provide more adaptability across platforms, but when I work on a project, I prefer writing it in the native language for each operating system. This means it will need to be completely redesigned for Windows and Linux later on. Although, I still have a lot of things to do to make it ready for MacOS, you don’t see it on the videos but I have been coding a lot to make sure of security and privacy with Omni in the backend.

The app will likely follow a subscription model, with an affordable price range of around $5-10 per month. It will offer various options, including LLMs like Claude and GPT, as well as pre-trained open-source models like Llama 3.1 in different versions (if they have enough space and computational power haha). This way, users won’t have to pay for the app and then pay additional costs for API services, which can be frustrating.

edit: Jayu is definitely a cool project, and I love the presentation and the way it’s presented is def better than mine but the way our backends work is totally different. From what I understand, most of Jayu’s features are pre-coded and fixed. With Omni, though, if you look at the backend, there’s nothing tied to specific commands, it’s all powered by various algorithms, including ones that self-correct. None of Omni’s features are pre-coded; everything is generated by the LLM itself, which I focused on to keep it as flexible as possible. One big thing people often miss is that Omni’s real game-changer is its root access to the laptop, allowing it to do things that non-root access just can’t. And trust me, getting an app to run root commands in the background on macOS isn’t easy lmao. Also, I spent so much time on Omni’s frontend, as you can see it’s very beautiful with many animations, even the logo of Omni is coded using some trigonometry lmao.

Thanks again for your feedback!

I had imagined that some of my suggestions you already had noted or at least had in mind (devs always have backlogs with improvements for the future).

I don’t see a problem in writing in the native language.
You will probably deliver a better code quality and overall experience.
The only problem is the effort to make new apps for Windows and Linux (and more important: maintain after release)

One suggestion here is to make a core app that is agnostic (or at least is exactly the same, you only have to compile again for every platform) and only fork in the user interface.
This will also probably avoid anomalous core behaviors in each version.
Or just “Flutterify”. Will probably work also.

As the business model/pricing, looks like a very competitive pricing.
And VC’s love monthly recurrent revenue.

PS: I just mentioned Jayu because, as an outside bystander, looked similar. I dont know how it works deep down to have an opinion but i also do prefer the more generic (and not hard coded) approach. And i think that root is specially important on OSX since you cant change a folder’s color without the system asking for your permission nowadays. Up for privacy and security / Up for annoyances.

PS2: On a completely different topic, if you have curiosity check this guy work: https://www.youtube.com/@SebastianLague. Its nice to see how good he is at math/coding and how he delivers on the videos. I always learn something new there and always suggest to people.

Nevertheless,
Good luck with the Y Combinator application!

3 Likes

These advice apply to other projects as well! Use streaming and other techniques to improve latency and responsiveness · Issue #51 · CsabaConsulting/InspectorGadgetApp · GitHub

2 Likes

So I looked at all the videos on your gram. Very cool product it’s like a better version of pywinassitant. Curious why you used “activate” as your key word instead of “Omni”. And yeah as earlier people have mentioned, distracting the user with some cool animation or wording would be nice. As for the GUI it looks like chatgpt in terms of branding, maybe make it a small widget that shows when it’s active/working/complete versus showing the entire chat. If someone cares about the chat to see what went wrong they’ll go and look at it, I assume your goal is to make it more seamless, so that’s my suggestion for that.

1 Like

Thanks! The user can set their preferred activation or deactivation words for the speech recognizer in the settings, so you can choose whatever suits you. It took around 1,000 lines of Swift code to ensure the speech recognition is smooth and seamless, allowing for conversation without needing to manually interact with anything.