Skip to content
Diosh Lequiron
Agriculture14 min read

Yield Data Collection That Actually Works in the Field

Precision agriculture data models assume infrastructure that smallholder farmers don't have. The Field Data Feasibility Test identifies what is actually collectible and worth collecting in field reality.

Precision agriculture as it is typically described requires data infrastructure that does not exist in most smallholder farming contexts in the Philippines or across comparable agricultural environments in Southeast Asia. Soil moisture sensors, weather stations at field level, satellite-derived vegetation indices, IoT devices transmitting real-time data — these are the data inputs that precision agriculture models are built on. They are also not what smallholder farmers have access to, what they can afford to maintain, or what their farming practices are designed around.

This creates a gap between the data architecture that precision agriculture prescribes and the data that can actually be collected in the field contexts where that data would have the most value. The gap is not primarily a technology gap — many of the sensor and connectivity technologies required for precision agriculture have fallen in cost substantially. It is a context gap: the farming system design, the economic structure of smallholder operations, and the decision-making environment of farmers managing multiple small plots do not map to the data collection architecture that precision agriculture assumes.

Closing that gap requires rethinking what yield data collection should look like in smallholder contexts, starting from what is actually collectible rather than from what a complete precision agriculture system would ideally have. The Field Data Feasibility Test provides a framework for making those judgments.


The Gap Between Precision Agriculture Models and Smallholder Reality

Precision agriculture data models are designed around a specific farming context: large, mechanized, single-crop operations where the farmer or farm manager has continuous presence, technology access, and sufficient margin to absorb the cost of data infrastructure. American corn belt operations. Brazilian soy farms. Australian wheat operations. These are the contexts where precision agriculture has produced documented returns, and they are also the contexts where the data collection infrastructure required is economically feasible and operationally manageable.

Smallholder farming in the Philippines looks different on every relevant dimension. Average farm size among Philippine smallholder farmers is well under two hectares, often fragmented across multiple plots that are not contiguous. Farmers typically grow multiple crops across those plots, with production decisions made seasonally based on price expectations, weather, labor availability, and household cash needs rather than through a formal production planning process. Connectivity in farm areas ranges from reliable to effectively nonexistent. Devices available to farmers are primarily mobile phones — many of them basic smartphones — rather than specialized agricultural monitoring equipment.

Data collection in this context faces constraints that precision agriculture frameworks do not account for. A soil moisture sensor installation requires capital investment, technical maintenance capacity, and a decision context in which the information the sensor provides is actually actionable for the farmer. A weather station at field level requires maintenance, power, and connectivity. An IoT data collection system requires device stability, consistent connectivity, and a data management system that can receive, process, and present the data in a form the farmer can use.

These constraints do not mean that data collection in smallholder contexts is impossible or unproductive. They mean that the data collection system must be designed for the actual constraints of the context rather than adapted from a system designed for a different context. The starting point is an honest assessment of what is actually collectible.


What Data Is Actually Collectible With Available Tools

The data collection toolkit available in Philippine smallholder agricultural contexts is more limited than precision agriculture literature assumes, but more capable than it is sometimes represented by commentators who dismiss digital agricultural tools as irrelevant for smallholder contexts.

Farmer smartphones — the primary device category available — can collect the following categories of data reliably: photographic records (crop condition images, pest and disease symptom documentation, harvest documentation), time-stamped records of farmer-reported events (planting dates, input applications, irrigation events, harvest), GPS coordinates (field boundaries, plot locations), and voice-based records where literacy barriers make text entry impractical. These are not trivial data categories. Photographic records combined with machine learning analysis can support disease and pest identification. Time-stamped farmer-reported event records constitute a planting-to-harvest timeline that, when aggregated across multiple farms, produces useful production data. GPS coordinates enable spatial analysis when data quality is sufficient.

What farmer smartphones cannot reliably collect is the continuous sensor data that precision agriculture models assume: continuous soil conditions, continuous weather data, continuous growth stage monitoring. The collection burden for continuous data — requiring the farmer to interact with the system regularly or install infrastructure the farmer must maintain — exceeds what most smallholder farmers will sustain over multiple seasons.

Cooperative-level data collection extends the feasible data range. A cooperative office with a scale, a moisture meter, and a desktop or tablet computer can collect harvest weight and moisture data for all production delivered to the cooperative for aggregation or marketing. This data — collected at a single point, with simple equipment, as part of the transaction process rather than as a separate data collection activity — constitutes a highly reliable production record for cooperative member farms. It is not the continuous, field-level data that precision agriculture models assume, but it is actionable data collected at sustainable cost.


The Field Data Feasibility Test

The Field Data Feasibility Test is a framework for evaluating whether a specific data element is worth collecting in a specific smallholder field context. It applies four criteria that must all be satisfied for a data element to be worth designing collection systems around.

Criterion 1: Collection Burden
Can the data be collected without imposing a burden that exceeds the capacity or willingness of the people collecting it? Collection burden has two components: the direct time and effort cost of collection, and the technical complexity of the collection method. Data that requires a farmer to stop work and make a detailed manual entry daily is high burden. Data that is collected as a byproduct of a transaction that would happen anyway — like recording the weight and moisture of harvested production at the point of sale — is low burden. Data that requires calibration, maintenance, or specialized equipment is high burden even if the individual data collection act is simple.

The practical threshold for field data feasibility is: if the collection process disappears when the external support disappears (when the extension worker stops visiting, when the subsidy for devices runs out), the data cannot be considered sustainably collectible.

Criterion 2: Verification Path
Is there a reasonable verification path for the data, or is it purely self-reported without any independent check? Self-reported data without verification paths degrades over time as farmers learn what responses are rewarded and as data entry fatigue leads to approximation. Verification does not require expensive auditing — cooperative-level cross-checks (does this farmer's self-reported planting date match the input purchase record?), photographic evidence attached to records, or GPS coordinates that confirm field location are all lightweight verification mechanisms.

Data elements with no verification path should be deprioritized relative to data elements where lightweight verification is feasible. This is not a judgment about farmer honesty — it is a recognition that self-reported data systems without verification mechanisms produce data quality that degrades in predictable ways and produces unreliable aggregate analysis.

Criterion 3: Actionability
Can the farmer, cooperative, or other decision-maker act on the data collected in time for it to affect outcomes? This criterion addresses the problem of data collection systems that produce interesting retrospective analysis but do not inform in-season decisions. Yield data that is compiled after the harvest and reported to farmers months later tells the farmer something about what happened but does not help them manage what is currently happening.

Actionable data has a time-relevance window: it needs to be available before or during the decision to which it is relevant. Planting date data is actionable for coordinating harvest labor and timing sales. Harvest weight data is actionable for estimating seasonal income and planning input purchases for the next cycle. Post-season quality analysis that arrives six months after harvest is not actionable in the sense relevant to this criterion.

Criterion 4: Aggregate Value
Even if data has low actionability for an individual farmer in isolation, does it have value when aggregated across multiple farms? This criterion captures the category of data that is worth collecting even when individual decision-making value is limited, because the aggregate constitutes a resource with value for the community, cooperative, policy-maker, or platform.

Regional production volume, price realization across a large sample of transactions, disease incidence across farms in proximity — these data categories have aggregate value that justifies collection even when individual farmers might not directly benefit from the data they contribute. The governance implication is that data collection systems with high aggregate but low individual value require explicit governance arrangements: members need to understand what they are contributing, why it has community value, and how their data will be used.


Applying the Test: One Data Element, Four Verdicts

The four criteria are most useful when applied to a single proposed data element, because a data element rarely fails or passes on all four at once — and the pattern of where it fails tells you how to redesign the collection method. Consider a concrete proposal: collecting daily soil moisture readings from member farms.

On collection burden, daily soil moisture fails immediately if it depends on a sensor the farmer must install, calibrate, and maintain, or on a manual reading the farmer must remember to take every day. The collection process would not survive the end of any pilot that subsidized the sensors. On verification, a manually entered daily moisture number has no independent check — there is no transaction record, photograph, or GPS coordinate that confirms it, so the data degrades toward approximation within weeks. On actionability, the reading is genuinely actionable in principle: a farmer who knows soil moisture is low can decide to irrigate. On aggregate value, daily moisture across many farms in a watershed would have real value for drought monitoring.

So the element passes two criteria and fails two. The failure pattern is the design instruction. Because the actionability and aggregate value are real, the element is worth collecting — but not as a daily manual reading. The redesign is to collect a coarser proxy that passes the burden and verification criteria: a weekly farmer-reported observation ("dry / adequate / waterlogged") attached to a photograph, entered alongside an event the farmer is already recording. The proxy sacrifices precision the smallholder context could never sustain anyway, in exchange for data that is actually collected. This is the test working as intended — not as a gate that rejects elements, but as a diagnostic that tells you which collection method a valuable element actually needs.


Lessons From Bayanihan Harvest Data Collection Design

The data collection approach developed through the Bayanihan Harvest platform has evolved through multiple iterations based on direct experience with what works in Philippine cooperative farming contexts. Several lessons are directly relevant to the Field Data Feasibility Test criteria.

On collection burden: the most reliable data in the Bayanihan Harvest system is transaction data — the data generated at the point of harvest delivery, sale, or input purchase. This data is low burden because it is collected as part of an activity the farmer is already engaged in, rather than as a separate reporting activity. The least reliable data is self-reported field condition data entered between transactions, which degrades consistently regardless of the design of the collection interface.

On verification: photographs attached to records improve data quality measurably. When a pest or disease identification record includes a photograph, the record can be reviewed for plausibility and used for analysis with higher confidence than a text-only record. The incremental burden of attaching a photograph to an existing record is low enough that farmers sustain this behavior across multiple seasons when the interface design makes it easy.

On actionability: price data is the category where actionability is clearest and most immediate. A farmer deciding when to sell can use a current price benchmark directly. Yield data from prior seasons improves planting decisions in the current season. Disease outbreak information from neighboring farms informs current-week monitoring decisions. These are the data categories where investment in collection quality produces the most direct farmer benefit.

On aggregate value: cooperative harvest data — weight and moisture records from all deliveries, combined with field location data — produces a production estimate for the cooperative's member base that has multiple uses: cooperative-level market negotiation, government program eligibility, credit application support, and planning for cooperative services. Individual farmers benefit from the aggregate even when they might not recognize the value of their individual contribution to it.


Where Field Data Collection Breaks Down

The Field Data Feasibility Test reduces the most common failures, but it does not eliminate them. Three failure modes recur even in well-designed collection systems, and naming them is part of designing honestly.

The first is the pilot-condition illusion. A data collection system performs well during a pilot because the pilot supplies things the steady state will not: an enthusiastic extension worker visiting weekly, subsidized devices, novelty that makes farmers willing to enter data they would otherwise skip. The system produces clean data and an impressive demonstration. Then the pilot support is withdrawn, and collection rates collapse — not because the design was wrong, but because the design was never tested against the absence of pilot support. The defense is to evaluate every data element against Criterion 1 as if the external support has already disappeared, not as if it will continue.

The second is incentive drift in self-reported data. Even with a verification path, farmers learn over time which reported values are rewarded — which planting dates qualify for a program, which yield figures attract attention, which condition reports trigger a visit. The data does not become dishonest so much as it bends toward the incentive. This is why aggregate analysis built on self-reported data needs periodic ground-truthing against transaction records, and why transaction data, which the farmer cannot easily shade because it is tied to money changing hands, remains the most trustworthy layer.

The third is the actionability mirage. A system can collect data that is technically actionable but arrives through a channel the farmer does not check, in a format the farmer cannot read quickly, at a moment when the farmer is not making the relevant decision. The data is actionable on paper and inert in practice. Actionability is not a property of the data alone — it is a property of the data plus the delivery path plus the decision moment, and a collection system that gets the data right but the delivery wrong has collected a number nobody used.


Designing Data Collection Systems That Survive Contact With Reality

The practical implication of the Field Data Feasibility Test is a data collection system design that starts with what passes all four criteria and layers in additional data elements only where the benefits are clear and the collection burden is manageable.

For most Philippine agricultural cooperatives, this means a core data system built around transaction records (harvest deliveries, input purchases, sales), supplemented by farmer-reported planting and harvest event dates, supported by photographs for pest, disease, and quality documentation, and governed by cooperative-level aggregation and quality checking. This is not a precision agriculture system in the full technical sense. It is a data system that produces reliable, actionable, verifiable data at a collection burden that is sustainable without external subsidy.

The temptation in agricultural technology design is to design for the ideal data system — the full complement of sensors, continuous monitoring, real-time transmission — and then work backward to whatever is achievable within budget. The Field Data Feasibility Test inverts this: start with what is actually collectible given the constraints of the context, validate that each element passes all four criteria, and build the data system outward from that foundation.

If there is one thing a cooperative or platform team can do this week, it is to take the single most important data element they currently collect and run it through the four criteria honestly — burden as if the pilot is over, verification path named or absent, actionability tied to a specific decision moment, aggregate value stated explicitly. Most teams discover that their most-prized data element fails on burden or verification, and that a humbler element they overlooked passes all four. That one audit reorders the roadmap more usefully than any new feature.

Data systems designed from this starting point tend to survive the end of pilot conditions and produce data quality that is actually useful for analysis. Data systems designed from the ideal backward tend to produce impressive demonstrations that degrade when the pilot support infrastructure is withdrawn and the farmers are expected to sustain collection practices they were never consulted about designing.

The best agricultural data is data that farmers collect because it helps them, because it is easy to collect, and because the cooperative or platform provides something in return for the contribution. Designing for that exchange — rather than designing for the ideal data set and then wondering why farmers don't maintain the collection — is the foundation of yield data collection that actually works in the field.

Continue in this series

This piece is part of Agritech in Emerging Markets: A Field Guide for Practitioners, my systematic guide to agriculture and community technology. Related reading:

See how this plays out in practice across my portfolio of ventures.

ShareXLinkedInFacebookThreads

Continue Reading

Agriculture

Why Most Agritech Fails in Philippine Agricultural Communities

The failure rate in Philippine agritech is not evidence that farmers resist technology. It is evidence that the technology has consistently been designed for the wrong things — the value chain, the investor model, or the technology capability — not the farming community's operational logic.

Read
Agriculture

Three Design Decisions That Determine Whether Agricultural Technology Serves Farmers or Extracts From Them

The design decisions that determine whether agricultural technology serves farming communities or extracts from them are not primarily technical. They are architectural: decisions about where operational knowledge lives, where value flows, and how the technology relates to existing cooperative governance.

Read
Agriculture

Offline-First Architecture: The Technical Foundation for Rural Technology

Offline mode is a feature added to a connected architecture. Offline-first is an architecture where disconnected is the baseline. The data model requirements, conflict resolution patterns, and synchronization protocols that make the distinction real.

Read
Agriculture

Why Agricultural Data Belongs to Farmers, Not Platforms

Most agritech platforms treat farmer-generated data as platform IP. A governance framework for agricultural data starting with farmer ownership — and why it produces stronger platforms, not weaker ones.

Read
Agriculture

When Social Impact and Commercial Viability Are Not in Conflict

The conflict between social impact and commercial viability is usually a sign of misaligned architecture. How Bayanihan Harvest is designed so that serving farmers better and building a sustainable platform are the same action.

Read
Agriculture

Why Agritech in Emerging Markets Cannot Copy Silicon Valley Patterns

Most agritech that fails in Southeast Asian smallholder contexts fails because it imported Silicon Valley assumptions — always-online, formal titling, English-only, single-user devices. Conditions are the architecture.

Read

Explore more

← All Writing