Trusted Data Are Essential
Data Need To Be Believable
Data need to be right. The analysis of data that have little relationship to reality has little value ... worse, the analysis may result in bad decision making. There is a need to ensure that dataflows have integrity and there is no replacement of valid information with fictional data. There is also the need to ascertain that data that are in the system are correct through a system of validation.
GIGO: Garbage In ... Garbage Out
While it is good practice to have fully normalized data in a relational system to have the most efficient data processing ... it is sometimes desirable to have redundancy in the data and dataflows so that data may be verified in an independent manner. Data should not only be right, but be seen to be right!
Knowing the source is a critical determinant of the validity of data. The identity of the source of data is a powerful starting point for the validity of data. From a personal point of view … if I see it, I trust it. If I know the person who saw it and reported it, I may still trust it.
If the data come from trusted sources, the it may be possible to trust the data. All data are therefore associated with a source … and the approach is to have sources assessed for their level of trust, for the validity of their data.
The idea of “provenance” applies in the area of data and analysis as much as it applies in the field of rare works of art. Data needs to be authentic and be a meaningful representation of the reality with which it purports to be associated. By identifying the source … and the chain of sources from which data emanates, the validity of data can be improved.
The reliability of data is easy to compromise when there is little or no validation of the source and origin of the data. There must be assurance that the data are what they purport to be. It should not be possible for data to be “highjacked” in transit and replaced by fraudulent data.
Fraud … misinformation
Modern society has much fraud and misinformation … a large part of which is never identified and called into question. The scale of fraud and misinformation is difficult to estimate, but it likely that more of the data in public circulation is wrong than is right.
The “old fashioned” responsibilities of the press to check the validity of what they published has been “costed” out of the procedures in modern press organizations … and in the “new media” space, the checking of validity has never been an important part of the culture. In the new media speed is of the essence … and right or wrong has little importance in terms of immediacy of communication, yet these data pollute the record essentially for ever.
Internal check … internal control
Business accounting has addressed the issue of internal check and internal control as an integral part of data system design for decades … but nothing like it exists for the data that are used in the prevailing dialog about the progress and performance of society. The role of media as the “fourth estate” providing a public check and balance is not working well with mis-information a pervasive problem.
Third party validation
Data should be easily verified ... and data that cannot be verified should be treated with the utmost caution.
Sadly, this is no longer universally true because accounting principles have been superseded by various laws, rules and regulations that allow various forms of reporting of financial results that are in conflict with the underlying principles of accountancy but suit various stakeholders in the process.
The validation of data needs to be done with care. There are many techniques that may be used, and they should be varied from time to time to limit the “gaming” of the system by people and organizations that have an interest in the failure of a ubiquitous effective social and socio-economic oversight data system