Mar 30, 2026 News!Vol.18, No. 1 has been published with online version. [Click]
Oct 09, 2025 News!The papers published in Vol. 17, No. 3 has been registered with CNKI. [Click]
Sep 28, 2025 News!Vol.17, No. 3 has been published with online version. [Click]

General Information

ISSN: 1793-8236 (Online)
Abbreviated Title Int. J. Eng. Technol.
Frequency: Quarterly
DOI: 10.7763/IJET
APC: 500 USD
Managing Editor: Ms. Shira. Lu
Abstracting/ Indexing: CNKI, Google Scholar, Crossref etc.
E-mail: ijet_Editor@126.com

HOME > Archive > 2012 > Volume 4 Number 6 (Dec. 2012) >

IJET 2012 Vol.4(6): 750-754 ISSN: 1793-8236
DOI: 10.7763/IJET.2012.V4.477

Removing Fully and Partially Duplicated Records through K-Means Clustering

Bilal Khan, Azhar Rauf, Huma Javed, Shah Khusro, and Huma Javed

Abstract—Records duplication is one of the prominent problems in data warehouse. This problem arises when various databases are integrated. This research focuses on the identification of fully as well as partially duplicated records. In this paper we propose a de-duplicator algorithm which is based on numeric conversion of entire data. For efficiency, data mining technique k-mean clustering is applied on the numeric value that reduces the number of comparisons among records. To identify and remove the duplicated records, divide and conquer technique is used to match records within a cluster which further improves the efficiency of the algorithm.

Index Terms—Data cleansing, De-Duplicator, partial duplication, K-Mean clustering.

The authors are with the Department of Computer Science University of Peshawar, Pakistan (e-mail: smbilal_84@yahoo.com).

[PDF]

Cite: Bilal Khan, Azhar Rauf, Huma Javed, Shah Khusro, and Huma Javed, "Removing Fully and Partially Duplicated Records through K-Means Clustering," International Journal of Engineering and Technology vol. 4, no. 6, pp.750-754, 2012.

PREVIOUS PAPER

Title: Security in Requirement Engineering for Qualitative Products

NEXT PAPER

Formal Verification and Validation of Aircraft Departure Process in Air Traffic Control System Using VDM++

2009

2010

2011

2012

2013

2014

2015

2016

2017

2018

2019

2020

2021

2022

2023

2024

2025

2026

Removing Fully and Partially Duplicated Records through K-Means Clustering