How Can I get better results when training a House Price Prediction Model?

totames · June 12, 2023, 4:17am

my problem is that I can not get any good results when training a model, whatever I use, doesnt work. I used KNN, Random Forest Regressor, Gradient Boosting Regressor, Linear Regression and I used Dense Layers.

I collected 34.400 lines of house price data. It contains these columns:

price, area, absolute_area, room, floor_count, lat, lng, area_abs_area_difference, area_room_ratio, building_age_new, building_age_very_young, building_age_young, building_age_mid, building_age_old, building_age_very_old

This is my dataset

It goes on about 34.400 lines.

What I have tried:

I need to create some models using this data. First, I separated the dataset into train and validation set:

train_df, val_df = train_test_split(dataset, test_size=0.15, random_state=42)

x_train = train_df.drop(‘price’, axis=1)
y_train = train_df[‘price’]

x_val = val_df.drop(‘price’, axis=1)
y_val = val_df[‘price’]

I used StandardScaler() to scale the data:

scaler = StandardScaler().fit(x_train)

import pickle

with open(‘scaler.pkl’, ‘wb’) as f:
pickle.dump(scaler, f)

def preprocessor(X):
A = np.copy(X)
A = scaler.transform(X)
return A

X_train_preprocessed, X_val_preprocessed = preprocessor(x_train), preprocessor(x_val)

Now, coming to my models:

Linear Regression:

lm = LinearRegression().fit(X_train_selected, y_train)

y_train_pred = model.predict(X_train_selected)
y_val_pred = model.predict(X_val_selected)

train_mse = mse(y_train, y_train_pred, squared=False)
val_mse = mse(y_val, y_val_pred, squared=False)

print(“Training MSE:”, train_mse)
print(“Validation MSE:”, val_mse)

The Output:

Training MSE: 32026915.5375083
Validation MSE: 25336006.528607745

KNN:

knn = KNeighborsRegressor(n_neighbors=35).fit(X_train_preprocessed, y_train)

r2_train = knn.score(X_train_preprocessed, y_train)
r2_val = knn.score(X_val_preprocessed, y_val)

r2_train, r2_val

The Output:

(0.11238750123213292, 0.2333151002444518)

I used Random Forest Regressor and Gradient Boositng Regressor too they gave the same results.

As for the last model, I used Dense Layers, I must say that in my project i am planning to use multiple models to pick so I must choose the Dense Layers too.

I created a neural network like this:

medium_nn = Sequential()
medium_nn.add(InputLayer((14,)))
medium_nn.add(Dense(32, ‘relu’)) # What is ReLU?
medium_nn.add(Dropout(0.1))
medium_nn.add(Dense(16, ‘relu’))
medium_nn.add(Dense(1, ‘linear’))

opt = Adam(learning_rate=1)
cp = ModelCheckpoint(‘models/medium_nn’, save_best_only=True)
medium_nn.compile(optimizer=opt, loss=‘mse’, metrics=[RootMeanSquaredError()])
medium_nn.fit(x=X_train_preprocessed, y=y_train, validation_data=(X_val_preprocessed, y_val), callbacks=[cp], epochs=100, verbose=0)

y_train_pred_medium_nn = medium_nn.predict(X_train_preprocessed)
y_val_pred_medium_nn = medium_nn.predict(X_val_preprocessed)

medium_nn_r2_train = r2_score(y_train, y_train_pred_medium_nn)
medium_nn_r2_val = r2_score(y_val, y_val_pred_medium_nn)

print(“R2 Score - Training Set:”, medium_nn_r2_train)
print(“R2 Score - Validation Set:”, medium_nn_r2_val)

This is the output:

R2 Score - Training Set: 0.28221913486721617
R2 Score - Validation Set: 0.4202027388380072

This is the best I can do. What do you suggest that I should do? I am pretty new, I collected the data on my own from a website using selenium but i do not know what is causing my models to not learn. Am I lacking data or am I doing something wrong? Sorry for posting this long, I do not know what to do. I am pretty new to the topic.

chunduriv · June 12, 2023, 12:10pm

@totames,

Welcome to the Tensorflow Forum!

You can refer to this notebook for house price prediction may help you.

If you are still have issues, here are some suggestions to improve the model performance:

Identify important features for predicting house prices using correlation analysis, feature importance from tree-based models or dimensionality reduction methods
Handle outliers appropriately by removing, transforming, or using robust regression techniques
Try to use k-fold cross-validation
Optimize model hyperparameters through techniques like Grid search or random search
Explore ensemble methods to combine multiple models

Please try as suggested above.

Thank you!

Sergey_Efimov · July 3, 2023, 4:02am

Your code contains this:
lm = LinearRegression().fit(X_train_selected, y_train)

y_train_pred = model.predict(X_train_selected)
y_val_pred = model.predict(X_val_selected)

But nowhere in your code is defining it.

Also, you’re using very small neurons (inputs/outputs) count.
Try to increase it so denses has at least 2048 of inter-connect links to other layers.

You can try to write automatic testor. Which will configure network for you.
And last, i am new to tensorflow, but not to NN. So check your layers.

Topic		Replies	Views
Is there any wrong with my code? General Discussion models , help_request	5	1391	October 13, 2021
Improve model accuracy for training and prediction General Discussion models , help_request	2	819	June 27, 2022
Neural Network Not Learning Correctly TensorFlow tensorflow	4	483	September 28, 2023
How improve neural network General Discussion learning , models , help_request	4	1253	September 9, 2021
About the reason why the accuracy of the model is fixed TensorFlow models , datasets	4	428	October 10, 2023

How Can I get better results when training a House Price Prediction Model?

Related topics