The learning process of a model is very similar to certain paths in human childhood. However, it also has similar bugs. Just as different people with different levels of knowledge in a certain field will arrive at completely different conclusions from the same fact.
Overfitting can occur during learning, discovery, or exploration. Like answers in many forums, the few with low upvotes are often the truly correct ones. Answers with higher upvotes are included in AI training, while the correct answers with low upvotes are discarded by filtering algorithms.
Many people believe that rare or deep industry knowledge is dispensable, and that even if the model makes some mistakes in these untrained areas, it won’t affect the quality of the vast majority of its answers. This is a completely wrong idea.
Just as university entrance exams test knowledge in each subject, which forms the foundation for completing more difficult degrees, deep industry knowledge is the foundation for a model’s correct understanding of a field. If a model lacks sufficient depth of physics knowledge, a physical model, and a true understanding of the principles behind physical formulas, it cannot correctly perform spatial perception or fuse spatial information from multiple types of sensors.
This path is long, and any errors in knowledge along the way will lead to failure. For example, directly feeding the raw data stream from a CMOS image sensor into the model, and simultaneously endowing the model with API capabilities for robotic arm control. Theoretically, the model could acquire complete robotic arm operations and the ability to correct these operations based on feedback, through visual feedback and the pressure sensors and IMU sensors on the robotic arm.
However, while this is theoretically possible, reality is far from it. Even minor errors in the knowledge sourced from the model’s training, or lossy compression during model quantization, can cause the entire system to collapse. For instance, if the model doesn’t know the difference between rolling shutter and global shutter on a CMOS sensor, it will produce fragmented images when reconstructing and understanding the raw data stream from the CMOS image sensor, failing to complete the next step of image understanding. Similarly, if the model doesn’t know whether the IMU setting is 50Hz or 200Hz, its calculations of the robotic arm’s trajectory and speed using the IMU data stream will be completely incorrect. These are very common examples of model failures caused by errors or incomplete training knowledge.
Of course, some lower-performing models require users to design a pre-decoder and ISP pipeline to decode the CMOS signal, process it into standard H.264 format before transmitting it to the model, and process the IMU quaternion signal into a trajectory before transmitting it to the model. In engineering, this adds a large number of unnecessary computational steps and introduces additional errors.
In the real physical world, when you pick up a dumbbell and want to know its weight, if the skin on your hand has no pressure-sensing nerves, what are your options?
Shake the dumbbell.
When you pick up a bottle of cola and want to know how much is left, shake the cola.
This introduces another crucial component for training models: “rare knowledge.” While this knowledge may seem like common sense to many, it’s difficult for most models and presents challenges in practical engineering.
Last year, I actually worked on a project involving a dumbbell with IMU trajectory and weight adjustment. The project involved inserting a 6-axis IMU sensor into the dumbbell to perform all sensing, including measuring the dumbbell’s weight. This measurement used the concept of shaking a can of cola, but its engineering implementation was somewhat complicated. First, to obtain the approximate latitude and longitude of the dumbbell user and the dumbbell’s orientation angle, we used a rather unusual technical approach. The measurement program was initiated when the dumbbell’s clock was inactive (e.g., late at night) and the gyroscope’s signal output was in a continuously detected static drift state. The program worked by preheating the gyroscope; once the temperature stabilized, a temperature compensation mapping table was loaded after one minute, and then the measurement began. The program’s working principle was based on the fact that, in a stationary state, the dumbbell rotates with the Earth’s surface; measuring the direction of this rotational acceleration easily revealed true north. Furthermore, the approximate latitude was obtained through time-sampling integration. The main challenges were purchasing a gyroscope with excellent Allen variance coefficients, using multiple stages of filtering such as the extended Kalman filter, and accurately measuring the characteristic parameters of the purchased batches of components in a laboratory setting.
Once this step obtains the user’s approximate Earth position and the dumbbell’s orientation, after the user adjusts the number of weight plates, picks up the dumbbell, and performs some shaking movements, the algorithm can calculate the gravitational component, thus obtaining the current weight information of the dumbbell. This information can then be displayed on the user’s phone along with the dumbbell weight and the calories burned during the workout.
“Why not use a 9-axis gyroscope? Because it’s impossible to use entirely non-magnetic materials for the weight plates and the dumbbell itself, which would significantly increase costs. We also need to consider the magnetic interference issue if the user leaves it near other iron fitness equipment.”
“Why is calculating true north using Earth’s rotation relatively rare? Because the first stage of most gyroscope signal filtering algorithms uses thresholding, which directly filters out very weak signals like Earth’s rotation, especially for noisy gyroscopes like MEMS. Traditional thresholding methods render subsequent north-finding algorithms ineffective.”
Often, filtering effective knowledge and data is like restoring a clean image from a noisy one. There are many methods, such as Kalman filtering, bandpass filtering, and frame cross-transformation, etc. However, these noise reduction methods all bring different disadvantages. For example, bandpass filtering can directly eliminate more advanced and cutting-edge results because these results have lower confidence levels and cannot be cross-validated. They also receive fewer “likes” in search results for related research keywords. This is somewhat similar to adding a threshold filter to a knowledge search algorithm, which eliminates weak sensor signals and obscures genuine algorithmic paths like north-finding.
I suspect some will suggest human intervention.
However, human intervention is extremely difficult because most people, even after a lifetime, may only approach cutting-edge knowledge and understanding in a specific subfield. Furthermore, even top-tier journals with high ratings often make fundamental errors in judging technical pathways. A significant portion of important scientific discoveries are based on overturning previous theoretical systems.



