Primary care data is an important piece of the evolving healthcare ecosystem. In addition to supporting the provision of patient care, primary care data can be used for a number of important secondary purposes. Understanding the tradeoffs between timeliness, accuracy, completeness and usefulness of primary care data is important to design systems that generate high quality data. As a case study, data quality measures and metrics are developed with a focus group of managers from a primary care organization. After calculating and extracting measurements of data quality, each measure was modeled with logit binomial regression to characterize tradeoffs and data quality interactions. Measures for accuracy, completeness and timeliness were calculated for 196,967 patient encounters. Report generation was measured as a proxy for the usefulness dimension. Based on the analysis, there was a positive relationship between accuracy and completeness, and a negative relationship between timeliness and usefulness. Importantly, the use of data was associated with an increase in completeness and accuracy. There were limitations to the measures and metrics developed with the focus group, however it was agreed that the measures were reasonable proxies for the data quality dimensions under study. The results provide meaningful insight on user tradeoffs and can be used in the design of systems in primary care.
One of the main challenges that data cleaning systems face is to automatically identify and repair data errors in a dependable manner. Though data dependencies (a.k.a. integrity constraints) have been widely studied to capture errors in data, automated and dependable data repairing on these errors has remained a notoriously hard problem. In this work, we introduce an automated approach for dependably repairing data errors, based on a novel class of fixing rules. A fixing rule contains an evidence pattern, a set of negative patterns, and a fact value. The heart of fixing rules is deterministic: given a tuple, the evidence pattern and the negative patterns of a fixing rule are combined to precisely capture which attribute is wrong, and the fact indicates how to correct this error. We study several fundamental problems associated with fixing rules, and establish their complexity. We develop efficient algorithms to check whether a set of fixing rules are consistent, and discuss approaches to resolve inconsistent fixing rules. We also devise efficient algorithms for repairing data errors using fixing rules. Moreover, we discuss approaches on how to generate a large number of fixing rules, from examples or available knowledge bases. We experimentally demonstrate that our techniques outperform other automated algorithms in terms of the accuracy of repairing data errors, using both real-life and synthetic data.
In applications where machine intelligence falls short (e.g. alignment of taxonomies on the Semantic Web, image annotation, label sorting), so-called social computation approaches that utilise crowds of interconnected human workers offer a viable solution. Computations such as these can be modelled as a collection of structured activities (i.e. workflows) that represent a blend of human and machine tasks. From a data quality perspective, social computations cannot be treated as traditional computational systems and existing quality models will need to be adapted or redesigned to accommodate the unique characteristics of such systems. We argue that only by enhancing the transparency of social computation systems will we be able to realize such novel quality assessment processes.






















