Amazon S3 + Go Fig

Data Stack

Connect S3 buckets to Go Fig for file-based data integration and analysis.

Amazon S3 is the default landing zone for exports, partner feeds, and data-lake files at most modern finance teams. Go Fig treats S3 as a first-class node in the Financial Intelligence Graph, so Celeste and AI financial analysts can join Parquet and CSV files against GL, CRM, and operational systems without a separate warehouse build. The connector handles Hive-style partition pruning, reads both single files and prefix trees, and supports Parquet, CSV, TSV, JSON, JSON Lines, Avro, and ORC. Sync cadence ranges from 15-minute polling to near-real-time via S3 Event Notifications routed through SQS, which matters for finance teams reacting to nightly ERP dumps or upstream pipeline outputs. Go Fig is aware of storage classes (Standard, Infrequent Access, Glacier, Deep Archive), flagging Glacier-tier objects before restore so you don't trigger a surprise restore bill. For regulated environments, the connector supports SSE-S3, SSE-KMS (including customer-managed keys), VPC endpoints, and bucket policies scoped to a specific IAM role assumed cross-account via AWS STS.

Connect Amazon S3 See All Integrations

Key facts

File formats: Parquet, CSV, JSON, JSONL, Avro, ORC
Auth: IAM role via STS AssumeRole (cross-account)
Networking: VPC endpoint (Gateway or Interface)
Sync mode: Poll or S3 Event Notifications via SQS
Encryption: SSE-S3, SSE-KMS (customer-managed)

SOC 2 Type II · All integrations

What you can do with Amazon S3 data in Go Fig

Data-lake joins without a warehouse

Query Parquet exports from Stripe, Snowflake, or your data team's pipelines directly against QuickBooks and Salesforce, no warehouse migration required.

Partner and vendor feed automation

Ingest daily CSV and Excel drops from 3PLs, payment processors, or factoring partners, reconcile against the GL, and flag exceptions automatically.

Historical archive analysis

Reach into multi-year archives in Infrequent Access or Glacier Instant Retrieval for trend and cohort analysis without rehydrating to Standard.

Data available from Amazon S3

Go Fig extracts and normalizes the following data from your Amazon S3 account:

Parquet files

CSV and TSV

JSON and JSON Lines

Avro and ORC

Hive-partitioned prefixes

Excel (.xlsx)

Compressed files (gzip, zstd, snappy)

Object metadata and tags

Storage class info

S3 inventory reports

How to connect Amazon S3

Create a scoped IAM role with cross-account trust

In IAM, create role fig-s3-reader with a trust policy that allows Go Fig's published AWS account ID to sts:AssumeRole (include the external ID Go Fig provides to prevent confused-deputy). Attach a policy granting s3:GetObject and s3:ListBucket only on the specific bucket ARNs and prefixes Go Fig needs.

Handle KMS and VPC endpoints if applicable

If objects are encrypted with SSE-KMS using a customer-managed key, add kms:Decrypt on the key ARN to the role policy and add the role as a key user on the KMS key. If the bucket is locked to a VPC endpoint, Go Fig can either connect via an Interface Endpoint into the bucket policy's allowlist or run inside a customer-deployed runner.

Configure buckets, prefixes, and partitioning

In Go Fig, provide bucket names and prefix paths (e.g., s3://exports/netsuite/gl/). Hive-partitioned prefixes (year=2026/month=04/) are automatically detected and partition columns become queryable. For Parquet, schemas are read from file metadata; for CSV, Go Fig infers types from the first 1000 rows or accepts a pinned schema.

Pick sync strategy

Default is 15-minute polling using LIST and ETag/last-modified tracking. For near-real-time, configure S3 Event Notifications to an SQS queue that Go Fig polls, so new objects land within seconds of the PUT. Glacier-tier objects are surfaced with a flag rather than auto-restored, so restore costs stay under your control.

Authentication: Cross-account IAM is the recommended pattern. Create an IAM role in your AWS account with a trust policy allowing Go Fig's AWS account to sts:AssumeRole, and grant the role s3:GetObject, s3:ListBucket (scoped to specific prefixes) and kms:Decrypt if using SSE-KMS. Access keys are supported for POCs but not recommended. For private-only buckets, use an S3 VPC endpoint with a bucket policy restricting access to the endpoint.

Common Questions About Amazon S3 Integration

How does Go Fig handle Hive-partitioned prefixes (year=/month=/day=)?

Partitions are detected and promoted to first-class columns automatically. When a query filters on a partition column, Go Fig prunes at the LIST-objects level so you don't scan and pay for files outside the filter. This makes multi-year partitioned exports practical to query directly rather than loading into a warehouse first.

What happens when objects are in Glacier or Deep Archive?

Go Fig enumerates them with a storage-class flag and does not auto-restore. If you want to query them, you trigger an S3 Restore yourself (bulk, standard, or expedited) and Go Fig picks them up once they're readable. For Glacier Instant Retrieval, reads are transparent since latency is similar to Standard.

Can I connect to a private-only bucket with no public access?

Yes. The recommended pattern is a bucket policy that only allows access via a specific VPC endpoint, and a cross-account IAM role assumed by Go Fig. Go Fig supports both Interface Endpoints (AWS PrivateLink) and Gateway Endpoints depending on your network topology. No public S3 access is required.

How does Go Fig deal with CSV quirks (BOM, encoding, irregular quoting)?

Common CSV issues are auto-detected: UTF-8 BOM, UTF-16, mixed line endings, irregular quoting, and embedded commas. Problem rows are surfaced in a rejects table rather than failing the entire sync, so a single malformed row in a partner export doesn't break the pipeline. You can also pin parsing options per file pattern.

What about cost control on very large buckets?

Go Fig tracks GET and LIST counts per sync and surfaces them in the connector view. Partition pruning, incremental-by-ETag tracking, and Parquet footer reads (metadata-only) keep scans minimal. For extreme cases, S3 Inventory reports can drive enumeration instead of per-sync LIST, which is the cost-optimal pattern on billion-object buckets.

Ready to connect Amazon S3?

See how your Amazon S3 data looks in Go Fig with a personalized demo.

Book a Demo