D-Dupe: A Novel Tool for Interactive
Visualizing and analyzing social networks is a challenging problem that has been receiving growing attention. An important first step, before analysis can begin, is ensuring that the data is accurate. A common data quality problem is that the data may inadvertently contain several distinct references to the same underlying entity; the process of reconciling these references is called entity resolution. D-Dupe is an interactive tool that combines data mining algorithms for entity resolution with a task-specific network visualization. Users cope with complexity of cleaning large networks by focusing on a small subnetwork containing a potential duplicate pair. The subnetwork highlights relationships in the social network, making the common relationships easy to visually identify. D-Dupe users resolve ambiguities either by merging nodes or by marking them distinct. The entity resolution process is iterative: as pairs of nodes are resolved, additional duplicates may be revealed; therefore, resolution decisions are often chained together. We give examples of how users can flexibly apply sequences of actions to produce a high quality entity resolution result.
Mustafa Bilgic, PhD Student, Computer Science
Louis Licamele, PhD Student, Computer Science
Lise Getoor, Assistant Professor, Computer Science / UMIACS
Hyunmo Kang, Faculty Research Associate, UMIACS
Ben Shneiderman, Professor, Computer Science / UMIACS
Interactive Entity Resolution in Relational Data: A Visual Analytic Tool and Its Evaluation
Hyunmo Kang, Lise Getoor, Ben Shneiderman, Mustafa Bilgic, Louis Licamele
IEEE Transactions on Visualization and Computer Graphics, Volume 14, Number 5, pp 999-1014, 2008 (TVCG '08).
C-Group: A Visual Analytic Tool for Pairwise Analysis of Dynamic Group Membership
Hyunmo Kang, Lise Getoor, Lisa Singh
Proceedings of IEEE Symposium on Visual Analytics Science and Technology 2007 (VAST '07).
GeoDDupe: A Novel Interface for Interactive Entity Resolution in Geospatial Data
Hyunmo Kang, Vivek Sehgal, Lise Getoor
Proceedings of Information Visualisation, pp.489-496, 2007 (IV '07).
D-Dupe: An Interactive Tool for Entity Resolution in Social Networks
Mustafa Bilgic, Louis Licamele, Lise Getoor, Ben Shneiderman
Proceedings of IEEE Symposium on Visual Analytics Science and Technology 2006 (VAST '06).
The executable (D-Dupe 2.0 runs on Windows XP. D-Dupe 2.0 does not run on Mac.) of D-Dupe 2.0 beta is available for download. (The size of the installation file is about 1MB.) InfoVis sample data is also downloadable separately. (The size of the sample library file is about 100KB)
Technical Description (D-Dupe 2.0 Beta)
- Language and SDK spec : C# / MS Visual Studio .Net 2005
- Supporting OS : MS windows 2000/XP (not Mac)
- Executable file size : 500 KB
- Total size of installed components : 2.25 MB
Please read the licensing terms carefully and register first. This will lead you to the download area.
DownLoad D-Dupe 2.0 Beta