Picking the Proper Granularity of Records

Selecting the appropriate granularity for your data records is crucial
for optimal AI analytics performance on Spartera. The right granularity
balances detail preservation with query performance, ensuring that AI
models can generate meaningful insights while maintaining fast response
times.

Understanding Data Granularity

Definition

Data granularity refers to the level of detail or precision in your
data records. It determines how atomic or aggregated your data points
are.

Granularity Spectrum

High Granularity (Fine-grained): Individual events, transactions,
or interactions
Medium Granularity: Aggregated by time periods, user sessions,
or business processes
Low Granularity (Coarse-grained): Highly summarized data at
department, monthly, or yearly levels

Factors Influencing Granularity Choice

1. Business Questions and Use Cases

Different analytics requirements demand different granularities:

-- High Granularity: Individual page views for user behavior analysis
CREATE TABLE page_view_events (
    event_id BIGINT,
    user_id BIGINT,
    session_id VARCHAR(100),
    page_url VARCHAR(500),
    timestamp TIMESTAMP,
    time_on_page SECONDS,
    referrer_url VARCHAR(500),
    user_agent VARCHAR(500),
    device_type VARCHAR(50)
);

-- Medium Granularity: Daily user activity summary
CREATE TABLE daily_user_activity (
    user_id BIGINT,
    activity_date DATE,
    total_page_views INTEGER,
    total_session_time_minutes INTEGER,
    unique_pages_visited INTEGER,
    conversion_events INTEGER,
    primary_device_type VARCHAR(50)
);

-- Low Granularity: Monthly user engagement metrics
CREATE TABLE monthly_user_metrics (
    user_id BIGINT,
    month_year VARCHAR(7), -- YYYY-MM format
    total_sessions INTEGER,
    total_time_spent_hours INTEGER,
    average_session_duration_minutes DECIMAL(8,2),
    pages_per_session DECIMAL(8,2),
    conversion_rate DECIMAL(5,4)
);

2. Query Performance Requirements

AI analytics tools perform better with appropriately sized datasets:

Too Fine: Millions of individual events slow down processing
Too Coarse: Aggregated data may hide important patterns
Just Right: Balanced level that maintains insight quality while
ensuring performance

3. Storage and Processing Costs

Consider the trade-offs:

-- High storage, high detail
-- Raw transaction logs: ~10GB/month

-- Medium storage, medium detail
-- Daily transaction summaries: ~1GB/month

-- Low storage, low detail
-- Monthly transaction summaries: ~100MB/month

Granularity Selection Framework

Step 1: Identify Primary Analytics Use Cases

Map your key business questions to required granularity levels:

Use Case	Required Granularity	Example
Real-time fraud detection	Individual transactions	Each credit card swipe
Customer behavior analysis	Session-level or daily	Daily user activity
Financial reporting	Monthly or quarterly	Monthly revenue summaries
Trend analysis	Weekly or monthly	Weekly sales performance

Step 2: Assess Data Volume and Velocity

-- Assess current data volume
SELECT
    table_name,
    table_rows,
    data_length/1024/1024 as size_mb,
    avg_row_length
FROM information_schema.tables
WHERE table_schema = 'your_database';

-- Project future growth
-- If adding 1M records/day, how will this impact:
-- - Query performance
-- - Storage costs
-- - AI processing time

Step 3: Define Aggregation Strategies

Create multiple granularity layers when needed:

-- Layer 1: High granularity for recent data (last 30 days)
CREATE TABLE recent_transactions (
    transaction_id BIGINT,
    customer_id BIGINT,
    transaction_timestamp TIMESTAMP,
    amount DECIMAL(10,2),
    product_id BIGINT,
    payment_method VARCHAR(50)
) PARTITION BY RANGE (transaction_timestamp);

-- Layer 2: Medium granularity for historical analysis
-- (daily aggregates)
CREATE TABLE daily_transaction_summary (
    customer_id BIGINT,
    transaction_date DATE,
    total_transactions INTEGER,
    total_amount DECIMAL(12,2),
    average_transaction_value DECIMAL(10,2),
    unique_products_purchased INTEGER,
    primary_payment_method VARCHAR(50)
);

-- Layer 3: Low granularity for long-term trends
-- (monthly aggregates)
CREATE TABLE monthly_customer_metrics (
    customer_id BIGINT,
    month_year DATE,
    total_spent DECIMAL(12,2),
    transaction_frequency INTEGER,
    customer_segment VARCHAR(50),
    churn_risk_score DECIMAL(3,2)
);

Common Granularity Patterns

1. Time-Based Granularity

Choose time windows that align with business cycles:

-- E-commerce: Hourly granularity for peak shopping analysis
CREATE TABLE hourly_sales_metrics (
    date_hour TIMESTAMP, -- YYYY-MM-DD HH:00:00
    total_orders INTEGER,
    total_revenue DECIMAL(12,2),
    unique_customers INTEGER,
    top_product_category VARCHAR(100)
);

-- SaaS: Daily granularity for user engagement
CREATE TABLE daily_product_usage (
    user_id BIGINT,
    usage_date DATE,
    feature_usage_count INTEGER,
    session_duration_minutes INTEGER,
    api_calls_made INTEGER
);

2. Entity-Based Granularity

Align granularity with business entities:

-- Customer-level granularity for personalization
CREATE TABLE customer_preferences (
    customer_id BIGINT,
    preferred_categories TEXT[], -- Array of categories
    average_order_value DECIMAL(10,2),
    purchase_frequency_days INTEGER,
    communication_preferences JSON
);

-- Product-level granularity for inventory management
CREATE TABLE product_performance_metrics (
    product_id BIGINT,
    metric_date DATE,
    units_sold INTEGER,
    revenue_generated DECIMAL(10,2),
    inventory_turnover_rate DECIMAL(8,4),
    customer_satisfaction_score DECIMAL(3,2)
);

3. Event-Based Granularity

For event-driven analytics:

-- Individual events for detailed analysis
CREATE TABLE user_interaction_events (
    event_id BIGINT,
    user_id BIGINT,
    event_type VARCHAR(50),
    event_timestamp TIMESTAMP,
    event_properties JSON,
    session_context JSON
);

-- Aggregated events for pattern analysis
CREATE TABLE session_event_summary (
    session_id VARCHAR(100),
    user_id BIGINT,
    session_start TIMESTAMP,
    session_end TIMESTAMP,
    event_counts JSON, -- {"click": 15, "view": 8, "purchase": 1}
    conversion_funnel_stage VARCHAR(50)
);

Optimization Strategies

1. Hybrid Approach

Maintain multiple granularities for different purposes:

-- Hot data: Fine granularity for recent, frequently accessed data
CREATE TABLE recent_detailed_logs (
    log_timestamp TIMESTAMP,
    user_id BIGINT,
    action_type VARCHAR(100),
    details JSON
) PARTITION BY RANGE (log_timestamp);

-- Warm data: Medium granularity for moderate access patterns
CREATE TABLE daily_activity_summary (
    activity_date DATE,
    user_id BIGINT,
    total_actions INTEGER,
    action_categories TEXT[]
);

-- Cold data: Coarse granularity for long-term storage
CREATE TABLE monthly_user_behavior (
    month_year DATE,
    user_segment VARCHAR(50),
    average_monthly_actions DECIMAL(10,2),
    retention_rate DECIMAL(5,4)
);

2. Dynamic Aggregation

Use views or materialized views for flexible granularity:

-- Base table: Fine granularity
CREATE TABLE base_events (...);

-- Dynamic aggregation views
CREATE VIEW hourly_metrics AS
SELECT
    DATE_TRUNC('hour', event_timestamp) as hour,
    COUNT(*) as event_count,
    COUNT(DISTINCT user_id) as unique_users
FROM base_events
GROUP BY 1;

CREATE VIEW daily_metrics AS
SELECT
    DATE_TRUNC('day', event_timestamp) as day,
    COUNT(*) as event_count,
    COUNT(DISTINCT user_id) as unique_users
FROM base_events
GROUP BY 1;

Performance Considerations

Query Optimization

Index Strategy: Create appropriate indexes for your chosen
granularity
Partitioning: Use time-based or entity-based partitioning
Compression: Apply compression techniques for historical data

Storage Management

Data Lifecycle: Implement archiving strategies for older, coarser
data
Hot/Cold Storage: Use tiered storage based on access patterns
Cleanup Policies: Define retention policies for different
granularity levels

Best Practices for Spartera Integration

1. AI-Friendly Granularity

Choose granularity that provides sufficient statistical significance
Ensure enough data points for pattern recognition
Avoid over-aggregation that eliminates important variability

2. Documentation and Metadata

-- Add comments explaining granularity choices
CREATE TABLE customer_daily_metrics (
    customer_id BIGINT
        COMMENT 'Unique customer identifier',
    metric_date DATE
        COMMENT 'Daily granularity chosen for daily business reporting cycles',
    total_spent DECIMAL(10,2)
        COMMENT 'Sum of all transactions for the day',
    transaction_count INTEGER
        COMMENT 'Number of individual transactions aggregated'
);

3. Consistency Across Tables

Maintain consistent granularity patterns across related tables
Use standardized time windows and aggregation methods
Document business rules for aggregation logic

Monitoring and Validation

Performance Monitoring

-- Monitor query performance across different granularities
SELECT
    table_name,
    AVG(query_duration_ms) as avg_query_time,
    AVG(rows_examined) as avg_rows_scanned
FROM query_performance_log
GROUP BY table_name;

Data Quality Checks

-- Validate aggregation accuracy
WITH detail_sum AS (
    SELECT
        DATE_TRUNC('day', transaction_time) as day,
        SUM(amount) as detail_total
    FROM transaction_details
    GROUP BY 1
),
summary_data AS (
    SELECT
        summary_date as day,
        daily_total
    FROM transaction_summary
)
SELECT
    d.day,
    d.detail_total,
    s.daily_total,
    d.detail_total - s.daily_total as difference
FROM detail_sum d
JOIN summary_data s ON d.day = s.day
WHERE ABS(d.detail_total - s.daily_total) > 0.01;

By carefully selecting the appropriate granularity for your records,
you'll ensure that Spartera's AI analytics can deliver fast, accurate,
and actionable insights while maintaining optimal performance and cost
efficiency.