D-Dupe: A Novel Tool for Interactive
Data Deduplication and Integration

Jump to:
Video demo
Related projects

Project Description

Visualizing and analyzing social networks is a challenging problem that has been receiving growing attention. An important first step, before analysis can begin, is ensuring that the data is accurate. A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity; the process of reconciling these references is called entity resolution. D-Dupe is an interactive tool that combines data mining algorithms for entity resolution with a task-specific network visualization. Users cope with complexity of cleaning large networks by focusing on a small subnetwork containing a potential duplicate pair. The subnetwork highlights relationships in the social network, making the common relationships easy to visually identify. D-Dupe users resolve ambiguities either by merging nodes or by marking them distinct. The entity resolution process is iterative: as pairs of nodes are resolved, additional duplicates may be revealed; therefore, resolution decisions are often chained together. We give examples of how users can flexibly apply sequences of actions to produce a high quality entity resolution result.


Graduate Students

Mustafa Bilgic, PhD Student, Computer Science
Louis Licamele, PhD Student, Computer Science


Lise Getoor, Assistant Professor, Computer Science / UMIACS
Hyunmo Kang, Faculty Research Associate, UMIACS
Ben Shneiderman, Professor, Computer Science / UMIACS


      Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its Evaluation
        Hyunmo Kang, Lise Getoor, Ben Shneiderman, Mustafa Bilgic, Louis Licamele
        IEEE Transactions on Visualization and Computer Graphics, Volume 14, Number 5, pp 999-1014, 2008 (TVCG '08).

      C-Group: A Visual Analytic Tool for Pairwise Analysis of Dynamic Group Membership
        Hyunmo Kang, Lise Getoor, Lisa Singh
        Proceedings of IEEE Symposium on Visual Analytics Science and Technology 2007 (VAST '07).

      GeoDDupe: A Novel Interface for Interactive Entity Resolution in Geospatial Data
        Hyunmo Kang, Vivek Sehgal, Lise Getoor
        Proceedings of Information Visualisation, pp.489-496, 2007 (IV '07).

      D-Dupe: An Interactive Tool for Entity Resolution in Social Networks
        Mustafa Bilgic, Louis Licamele, Lise Getoor, Ben Shneiderman
        Proceedings of IEEE Symposium on Visual Analytics Science and Technology 2006 (VAST '06).

Related Projects

Graph Visualization

Video Demo

D-Dupe v2.0 demo (800x600 resolution)
D-Dupe v2.0 demo (1024x768 resolution)


The executable (D-Dupe 2.0 runs on Windows XP. D-Dupe 2.0 does not run on Mac.) of D-Dupe 2.0 beta is available for download. (The size of the installation file is about 1MB.) InfoVis sample data is also downloadable separately. (The size of the sample library file is about 100KB)

Technical Description (D-Dupe 2.0 Beta)

Please read the licensing terms carefully and register first. This will lead you to the download area.

DownLoad D-Dupe 2.0 Beta