Services· Field operations

How to Build a Field Service Mobile App in 2026 (Voice Notes + Photos → AI Reports)

Your field techs don't open laptops. They open phones. This is the field-service mobile app pattern operators worldwide are using to turn voice notes and photos into structured reports automatically, real native iOS + Android, no developers, no Salesforce.

There is a workflow that exists in almost every service business and almost no software handles well on a phone: a person at a job site needs to record what happened, who they talked to, what they saw, and what comes next, and that information needs to reach the office in structured, searchable form, captured one-handed, in dirty conditions, often with no signal.

If you run a construction company, an HVAC business, a plumbing operation, a cleaning service, a property management firm, a clinic with mobile hygienists, a delivery service, an auto-service shop, or any business where someone spends their day off-site, you know this problem. The existing options are bad:

  • Pen and clipboard. Paper. Half the time never digitized.
  • Generic note-taking apps (Notion, Apple Notes). Unstructured. Doesn't link to your client database.
  • Web-app no-code tools (Glide, Softr, Adalo). Look like apps. Aren't. Camera works inconsistently, audio recording is fragile on iOS, push notifications barely exist on the web, background uploads are not a thing. Fine for office tooling, broken for field workers. See our Rork vs Glide vs Bubble comparison.
  • Vertical SaaS (ServiceTitan, Jobber, Housecall Pro, Simpro, FieldEdge). Built for specific verticals. Per-seat pricing punishes you. Their workflow, not yours.
  • A native iOS + Android build from an agency. $80,000–$200,000 and 6 months. See our cost breakdown.

This guide is the sixth option, a real native mobile app, on the App Store and Google Play (or distributed internally), built by the operator.

Why Native Matters Here (Not Just Marketing Copy)

Field-service is the use case where the difference between a web app and a real app stops being theoretical:

  • The camera. Native iOS and Android cameras give you GPS metadata, high-resolution capture, and access to advanced sensors (LiDAR on iPhone Pro for measuring). Browser cameras strip metadata and give you a downsampled JPEG.
  • Audio recording. Native AVAudioRecorder (iOS) and AudioRecord (Android) capture clean audio in noisy environments with no browser quirks. Web-app audio is famously broken on iOS Safari, many recordings simply fail.
  • Background tasks. A tech finishes the visit, locks the phone, walks to the truck. The 30 photos and the voice note continue uploading in the background. Web apps cannot do this.
  • Offline-first. Construction sites, rural service routes, basement HVAC jobs, no signal. Native apps cache everything locally and sync when LTE returns. Web apps don't load.
  • Push notifications. Dispatcher assigns a new job → tech's phone vibrates in 2 seconds, even if the app is closed. iOS web push exists but is brutally restricted; on Android it works but unreliably.

Skip native, and your field team stops using the app within two weeks. Every operator we've watched try this learned it the hard way.

The Pattern That Works (Mobile-First)

1. One-Tap Arrival

Tech walks into the site. Opens the app. Taps "I'm here." That's it. The app records native GPS coordinate, timestamp, and the assigned job (cached offline from the morning dispatch).

Anything more complex, login, dropdown navigation, and field techs will skip the app. One tap, one anchor record, everything else attaches to that.

2. Voice Note in the Field

The tech walks the site and talks. Doesn't type. Doesn't tap fields. Just talks for 2–10 minutes.

Native audio at Opus 32 kbps gives crisp recordings of ~7 MB per 30-minute walkthrough. Use Expo's expo-av (Audio.Recording):

  • Big red "Record" button. That's it.
  • Start, pause, resume, save. No "select category", that comes later from the AI.
  • Recording continues if the screen locks (background audio task, must be declared in Info.plist for iOS).

3. Photos Attached to Context

Same screen, two taps to add photos. Native camera via expo-camera:

  • Auto-tagged with the client, visit, and voice note the tech just recorded.
  • Resized to 1920px max width before upload (use expo-image-manipulator). 4MB → 200KB. Storage bill thanks you.
  • Stored in Supabase Storage or Cloudflare R2.

4. AI Turns Voice + Photos Into a Structured Report

The magic step. Flow:

  1. Voice note arrives at backend (Supabase Edge Function or Cloudflare Worker).
  2. Send audio to OpenAI Whisper (whisper-1). ~$0.006/minute. A 5-minute voice note = 3 cents.
  3. Send transcript + photo URLs to GPT-4o or Claude Sonnet with a system prompt: "You are writing a site report. Given a transcript and a list of attached photos, produce a structured report: summary, observations, risks, action items, time estimate. Be specific. Cite the photos by index."
  4. Structured JSON returns. Save to the visit record.

Total round-trip cost per visit: 5–15 cents. Labor saved per visit: 20–40 minutes of office work that often was not being done at all because nobody had time.

5. Push Notification to the Office

The instant the report is ready, the dispatcher gets a native push notification. Tap → opens the visit record in the office's web app (same Rork project, web target).

This is the loop. Native field app + AI + push + web admin. Each piece is a few prompts in Rork. All four together is the moat your competitors won't have.

Rork vs The Alternatives

PlatformReal native iOS/AndroidAI voice→reportOffline-firstCustom workflowBest for
Rork✅ Real native (Expo)✅ Built in via OpenAI✅ Yes✅ AnythingCustom field apps
Glide❌ Web only⚠️ Add-on⚠️ Browser cache✅ YesOffice dashboards
Bubble⚠️ Mobile beta⚠️ Plugins⚠️ Limited✅ YesWeb apps
Adalo⚠️ Hybrid❌ Not built-in⚠️ Limited✅ YesConsumer apps
ServiceTitan✅ Native (theirs)⚠️ Recently added✅ Yes❌ Their workflowHVAC / plumbing standard
Fieldproxy✅ Native✅ AI native✅ Yes⚠️ TemplatesFast deployment
Custom iOS+Android✅ Native✅ Anything✅ Yes✅ AnythingCompanies with eng teams ($250k+/yr)

The combination of real native + AI voice-to-report + completely customizable + SMB-affordable is what makes Rork uniquely positioned for field-service SMBs.

The Stack Operators Actually Run

  • App framework: Rork for cross-platform native (iOS + Android + web from one Expo project).
  • Backend: Supabase. Auth, Postgres, Storage, Row-Level Security for role-based access (owner sees everything, dispatcher sees jobs, tech sees only their own).
  • Transcription: OpenAI Whisper API. Best accuracy for noisy environments. Multilingual (99+ languages).
  • Report generation: GPT-4o or Claude Sonnet. Mid-tier reasoning is enough.
  • Photo upload: Supabase Storage or Cloudflare R2 (R2 cheaper at scale).
  • Push: Expo Push (wraps APNs + FCM into one API). Free for small volumes.

Total monthly cloud cost for a 20-tech business: under $300. Plus $200/mo for Rork Max.

Distribution: Public vs. Enterprise

Two paths for getting the app on your team's phones:

  1. Public App Store + Google Play listings. Free except $99/year for Apple. Useful if subcontractors or temporary staff need access.
  2. Apple Business Manager Custom App + Google Play Managed. No public listing. Your company is the only one with access. Faster review, cleaner liability. Available worldwide.

Both ship from the same Rork project.

Edge Cases That Trip Everyone

  • iOS background audio recording requires Info.plist declarations, specifically UIBackgroundModes: audio plus NSMicrophoneUsageDescription. Without these, recordings stop the second the screen locks.
  • iOS background uploads must use URLSession background sessions, Expo's FileSystem.uploadAsync with uploadType: BACKGROUND handles this.
  • Android battery optimization will kill your app. Users have to grant "background activity" permission. Bake the ask into your onboarding.
  • iOS PHPhotoLibrary permissions changed in iOS 14+. Request limited access by default. Apple rejects apps asking for more than they need.
  • GDPR/labor law. Most operators record location only on arrival, one timestamp + one coordinate. Legal almost everywhere; check with local labor lawyer.

What Operators Are Building

A construction company in southern Spain runs this exact flow today. Techs walk sites with Android phones, record voice notes, snap photos, tap send. Office sees an AI-generated structured report 30 seconds later on their laptops. The owner, a non-technical lawyer running $20M, says the surprising part isn't time saved per visit. It's the reports that now exist that wouldn't have existed at all, compounding into a different business over a year.

The same flow is being run by HVAC operators in Texas, plumbing companies in Florida, cleaning services in London, dental clinics with mobile hygienists in Brazil, logistics dispatchers in Mexico City. Same pattern, different vertical.

Where to Start

  • Today: Open Rork. Describe the workflow above in your own words. Use plan mode to let the model interrogate you on offline sync and audio edge cases.
  • This week: Ship a v1 to TestFlight. Two of your techs install it on real phones.
  • Next month: Roll out to the whole team. Connect AI reports. Push them to the office automatically.

By the time a SaaS sales rep finishes their pitch deck with you, your app is already on your techs' home screens.

Frequently asked questions

What's the difference between a real field service app and a web app?+
A real app (the kind you download from the App Store) has direct access to the camera with full metadata, reliable background audio recording, background uploads, push notifications, and works offline. A web app is a website that looks like an app, these things either don't work or are extremely unreliable, especially on iOS. For field crews working in low-signal environments, a real app is the only acceptable choice. See [Do you need a real app or a website?](/guides/app-vs-website-for-business).
How does the AI voice-note-to-report flow actually work?+
Three steps. (1) Tech records a voice note in the app and snaps photos. (2) Audio uploads to your backend, which calls OpenAI Whisper for transcription (about $0.006/minute). (3) Transcript + photo URLs go to GPT-4o or Claude with a system prompt that knows your business, it returns a structured report (summary, observations, risks, action items) which saves to the visit record. Total cost: 5–15 cents per visit. Labor saved: 20–40 minutes.
Will OpenAI Whisper work with Spanish, French, German, Portuguese, etc.?+
Yes. Whisper handles 99+ languages out of the box, including all major European languages, Latin American Spanish, Brazilian Portuguese, Arabic, Mandarin, and more. Crews in Spain can speak Spanish, French crews in French. The report can be generated in any language you want via the GPT prompt.
What about GDPR / employee privacy law for GPS tracking?+
Continuous employee location tracking is restricted across the EU, UK, parts of Canada, and (less strictly) the US. Best practice for almost every jurisdiction: log ONE GPS coordinate per site visit, on arrival, not continuous tracking. That's legal almost everywhere and gives you 90% of the operational value. Always confirm with a local labor lawyer before rollout.
How do I distribute the field app to my technicians without going to the public App Store?+
Apple Business Manager Custom App distribution (free) plus Google Play Managed Distribution. Your app is installed only on your team's phones via MDM, no public listing, no consumer review. Faster Apple review, no public reviews, no consumer risk. Works in every country where the App Store and Play Store operate.
Can I integrate the field app with Salesforce, ServiceTitan, or our existing CRM?+
Yes via API. Most major CRMs (Salesforce, HubSpot, ServiceTitan, Jobber) have REST APIs. A Supabase Edge Function can sync clients, jobs, and visits both directions. Setup is typically half a day per direction. Many operators use the Rork app as the field tool and the existing CRM as the office system of record.
How long does it take to ship a v1?+
Most operators get a working v1 (arrival tap + voice note + photos + basic AI report) in 7–14 days of focused work. Adding offline sync, push notifications, role-based access, and a full backend pipeline takes another 2–4 weeks. Total: 4–6 weeks for a production-ready field service app.

Related guides