Dataflow Architecture is Important
Data centric everything
Design to collect data locally
The best way to collect data is to collect using a locally based system. This has many advantages including the short linkage between the reality, the data that are collected, and local decisions that may be made using the data. There can be a well motivated team engaged with this work, and local costs may be considerably less than would be possible using a team comprised of external experts.
The use of satellite imagery and remote sensing to replace local data collection is counterproductive because much of the value of the data is derived from the use of the data in the local setting … but these technologies may be used effectively to supplement locally acquired data.
Academic research is not the right model
There is a place for academic research … and a role for rigorous scientific and statistical method … but most decision making should be bast on fast low cost dataflows that are right enough to get the decisions right practically all the time. This is not what academic researchers are able to do … and in the main, this is not what their they are working to do!
Use data many times
The most cost effective data are data that are used in many different ways. There should ideally be one pool of data, and this one pool should be used in different ways for the specific analysis needed. Essentially analysis provides many different views of the data.
Use locally … simple analysis, practical use
Local data may need some simple analysis to be useful for local decision making … but this should be quick and easy. If there is progress … good … if there is little or none then what was wrong with the analysis and what should be tried now.
But in this local analysis and local decision making there is a “risk” evaluation that may not be fully understood or articulated. Poor people do not have the resources to afford a mistake … they cannot “write it off” and move on the way a rich corporate group might do. The children do not go to school, or worse, they die.
In the context of TVM, local data are first used to help with local operational decisions. These are decisions that have a big impact on the performance of a community and frequently are the lacking in data that are relevant and timely.
The most important use of data is the use of data to manage local operations and activities. This is where performance improvement has the most impact and where good data may have achieve the most. With good local use of data, the cost of collecting data and the value of using data are within the same economic domain.
The following graphic is a simple representation of how data may be used to serve several different purposes effectively.
Local data collection ... local analysis ... local action is the cycle that improves performance most directly and most quickly.
Having the data also used at a “higher” level facilitates oversight and the sort of monitoring that can be used to identify the need for corrective action by the analysis of much larger sets of data. At a higher level there can be analysis that identifies “best practice” and issues that are impossible to identify with local analysis alone.
Local people collecting local information is a good way to achieve cost effective data collection. There is a need for adequate training and supervision, but that is true of any approach to data collection. The two advantages of local staff are: (1) modest remuneration requirements; and, (2) familiarity with the place and people.
Survey inaccuracy … amazingly wrong!
Some recent work supervised by Dr. Jonathan Morduch of NYU showed that interview data was hopelessly inaccurate from a first visit survey ... and only reached reasonable correctness after several weeks and multiple visits.
No one data collection approach is likely to be universally optimum. So much depends on the training and experience of the people in the community, and the practical issues of access to information technology and communications infrastructure. A hybrid system involving both manual forms and electronic systems will usually be the way forward. The cost effectiveness of writing in ink in a book should not be totally discounted!
Use same data for oversight and accountability
The same data that are useful to help make decisions at the local community level are also the data that may be used to do oversight. The data architecture allows for roll-up and making summary reports … and with summary reports it is possible to do oversight easily and accurately. Where needed the same data may be used to facilitate accountability. The data architecture used for TVM enables oversight and accountability without contributing to more and more data overload.
Then use data for academic study
Some academic study needs a large amount of data, and the TVM data architecture makes it possible for a very large database to be built that allows for very large data mining projects to be designed and set in motion.
Scientific research may result in a better understanding of the underlying science and critical issues that will never be seen in the smaller local datasets.
Example from the malaria health sub-sector
Detailed spatial information is needed to control malaria in a community … and these data in a consolidated form are suited to oversight and accountability at a higher level. The same data are also ideal for the large scale data mining needed for the early detection of pesticide and drug resistance.
Keeping data costs low
The multiple use of data is one element to making data cost effective and valuable. The basic data architecture used by TVM maximizes use of data. This has the secondary effect of making the data more reliable, because data that are used are always more reliable than data that merely sit and do nothing!
Another element is to do data collection in the community for the community by the community. This is usually lower cost than having data collection experts from outside the community.
Technology may be a way to reduce costs … but a problem with technology is that it often serves to make something technology intensive rather than labor intensive and in the process transfer low labor costs to become high technology costs. Good cost analysis will show this problem … but when there is no costing, it is easy for this matter to be hidden from analysis!
Ubiquitous mobile technology infrastructure
Though the power and possibilities for the application of information technology have improved by a millionfold in the course of the last fifty years, but it has not resulted in better data or decision making to benefit society as a whole. The use of data to achieve broad based socio-economic progress and high performance has been very limited.
Anyone and everyone can use TVM … contributing to dataflow using a mobile phone or Internet webpage forms. Individuals may be contributors to the dataflow … as well as organizations.
The dataflow that results makes it possible to have independent oversight of socio-economic activity and in turn the organizations engaged in decision making about the allocation of resources and choice of activities in the community and the global economy.
All the stakeholders in society are able to make use of the data and analysis so that decision makers have the data that will help them … and there can be oversight and accountability about the progress and performance by all the socio-economic actors.
Modern technology makes it possible for data to move around the world instantly ... but why? The goal should be to use data usefully more than merely to have data. Although long distance and global data transmission is very low cost ... compared to pre-electronic times ... it is not costless, and it is unproductive.
Data that are useful for improving performance at the community level should be easily accessible for decision making at this level. These data do not need to travel far in order to be of material value locally. The same data, however, can be transmitted to a consolidated database for scientific analysis if that is required.
The Internet makes it possible for data to move from one part of the globe to another instantly. The only requirement is Internet access ... broadband Internet access. Increasingly broadband Internet access is widely available, though in many poorer countries the cost of access is relatively high. Data may be transmitted using an FTP (File Transfer Protocol), using an attachment to an e-mail or direct upload to a web-based application.
Mobile cell-phone technology has now become very widespread and is now capable of some data transfer more conveniently than using the Internet. Cell-phone coverage is now reaching most communities around the world, including quite poor and remote places.
Some special data design is needed for transmission efficiency, but transmission efficiency can be good where there is application of the relational model for database design.
Data storage … and efficient access to use everywhere
Data are essential to transparency and accountability but data that are needed are rarely easily accessible. Good data storage facilitates access. The details of the storage architecture will change from time to time ... but the general theme is that data should be accessible easily for those who need the data to make good decisions. There are multiple levels:
1. Data in the hands of a data collection person
These data are needed so that the work of data collection can be as efficient as possible ... including some immediate feedback about changes that might be locally important.
2. Data at the community level
These data may be analyzed very quickly to provide the information needed at the local level to determine what are the issues and how they might best be addressed.
3. Data at the national oversight level
These data are a component of the data needed for good governance and oversight.
4. Data for national level research
These data are a part of a research process that has the potential to help with both learning and teaching in the country
5. Data for global research
These data are a part of a research process that has the potential to advance learning on a global basis. Modern computational technology such as available at the US National Center for Supercomputing Applications (NCSA) makes it possible to process very large datasets and learn from these data
Data are needed for the effective management of performance ... but it is not at all clear that the essential data are collected ... and to the extent that they exist, they are not easily accessible.
Because data are important for the administration of society, it is normal for there to be laws and regulations that give guidance about how data must be stored and be accessible to interested parties. In general these laws and regulations do not help very much with the issue of transparency and accountability as a part of day to day ordinary life. The issue of socio-economic performance and the impact on society is not part of the data landscape.
The corporate organization is increasingly aware that data storage is a cost in the best of times, and may be a catastrophic cost if the law and regulations are called into effect for access to these data.
Data storage has moved way beyond just paper ... everything can be digital ... everything can be organized so that there may be easy analysis and the data be valuable ... especially for society as a whole.
The cost effectiveness of technology is only going to be fully realized if the data architecture is sound and logical. This is the core of what TVM can do.