Data Preparation
Raw data is rarely chart-ready.
PressViz works best when your dataset is focused, clear, and shaped around a single story. This guide shows how to reduce noisy spreadsheets into something that produces a clean chart instead of an overloaded one.
Why data size matters
Section titled “Why data size matters”Charts are for communication, not storage.
When a dataset is too large or too detailed, the result is usually harder to read:
- bars, lines, or dots begin to overlap
- axis labels become crowded
- interactions feel slower
- the chart stops telling a clear story
PressViz includes a 500 row import limit to help keep both the editor and the published chart usable. That limit is not just a technical guardrail. It also encourages better data storytelling.
The best charts usually answer one focused question, not every possible question in the source file.
Core techniques
Section titled “Core techniques”Filter by time range
Section titled “Filter by time range”If you are importing years of daily data, start by narrowing the time period.
Instead of uploading:
All daily ridership from 2020–2025
Try something more focused:
January 2024 daily ridership compared with January 2023The last 90 days of signupsQ4 2024 weekly revenue
Ask yourself:
- What story am I trying to tell?
- What date range actually matters to that story?
Practical steps
Section titled “Practical steps”- Open the data in Excel or Google Sheets.
- Filter to the date range that matters.
- Copy the filtered rows into a new sheet.
- Export that smaller sheet as CSV.
- Upload the refined CSV to PressViz.
Aggregate to a larger time period
Section titled “Aggregate to a larger time period”Daily data is often too dense for a clean chart, especially over long periods.
If you want to show a trend, aggregate the data into:
- weekly values
- monthly values
- quarterly values
For example:
- Raw data:
1,706daily records - Refined data:
52weekly aggregates
You keep the trend, but remove the clutter.
How to aggregate in spreadsheets
Section titled “How to aggregate in spreadsheets”- Add a helper column for week, month, or quarter.
- Use a pivot table or formulas like
SUMIF()to group the data. - Calculate the metric you need: sum, average, or count.
- Keep only the aggregated rows for export.
Filter by category
Section titled “Filter by category”Too many categories or series can turn a chart into noise.
Instead of showing everything:
All 47 U.S. statesAll 150 product SKUsEvery department in the company
Focus on a useful subset:
Top 5 states by populationBest-selling categoriesSales and Marketing only
Rule of thumb
Section titled “Rule of thumb”Once a chart goes beyond about 5–7 series, it usually becomes much harder to read.
Filter to:
- top performers
- most relevant groups
- the categories tied to the story you are telling
Select only relevant columns
Section titled “Select only relevant columns”Many raw exports contain far more columns than a chart needs.
PressViz usually needs:
- one label column
- one or more value columns
Everything else is often extra noise.
Cleanup flow
Section titled “Cleanup flow”- Identify the label column.
- Identify the value column or columns.
- Remove metadata, notes, audit fields, and unused calculations.
- Rename columns clearly.
- Export the cleaned sheet.
Good column names help both you and your readers:
RegionWeekQ4 RevenueAverage Ridership
Real-world walkthrough
Section titled “Real-world walkthrough”Imagine you have MTA daily ridership data for 7 years and want to compare recent trends across transit lines.
Raw dataset
Section titled “Raw dataset”- about
2,555rows - about
12columns - too much detail for a quick visual comparison
Refined workflow
Section titled “Refined workflow”- Filter to the last
3months. - Aggregate to weekly averages.
- Keep only the columns for:
- week
- line
- average ridership
- Limit the comparison to the top
3lines.
Result
Section titled “Result”Instead of a massive raw sheet, you now have:
- roughly
36focused rows 3useful columns- a chart that clearly compares weekly ridership trends
That is the difference between dumping data into a chart and shaping data into a story.
Common mistakes
Section titled “Common mistakes”| Mistake | Why it fails | Better approach |
|---|---|---|
| Uploading the entire raw dataset | Too much overlap and poor readability | Filter by time range or category |
| Mixing daily and weekly data | Inconsistent granularity confuses the chart | Aggregate everything to one level |
| Showing 15 or more series | The visual becomes spaghetti | Limit to the top 5–7 series |
| Keeping missing values | Gaps can mislead or weaken the chart | Fill or remove incomplete rows |
| Using vague column names | Hard to understand inside the editor and on the final chart | Rename columns clearly before import |
Tools that help
Section titled “Tools that help”You do not need special software to prepare data well.
Useful options:
- Excel
- Google Sheets
- Python with pandas
- SQL queries
- OpenRefine
For most users, Excel or Google Sheets is enough.
Chart-type considerations
Section titled “Chart-type considerations”Different chart types tolerate different levels of density.
| Chart type | Ideal row count | Max series | Notes |
|---|---|---|---|
| Bar / Line | 10–50 | 3–5 | Good for trends and comparisons |
| Pie / Doughnut | 5–12 | 1 | Keep slices limited |
| Area | 10–50 | 2–3 | Works best with fewer series |
| Scatter | 20–100 | Multiple | Each point matters, so clarity still matters |
These are not hard rules, but they are strong defaults.
Final checklist before upload
Section titled “Final checklist before upload”- File has fewer than
500rows - File is under
5 MB - First row contains headers
- Column names are clear and descriptive
- Extra columns have been removed
- Missing values are handled
- Dates use a consistent format
- You can explain the chart in one sentence
Example:
This chart shows weekly average ridership for our top 3 subway lines over the last quarter.
If you cannot describe the chart simply, the data probably still needs refinement.
Is 500 rows a hard limit?
Section titled “Is 500 rows a hard limit?”It is a practical limit designed to keep charts readable and the editor responsive. Even below that number, smaller and more focused datasets usually produce better results.
Can I upload 50 or more series?
Section titled “Can I upload 50 or more series?”You can, but most readers will struggle to compare them. In practice, 3–7 series is a much better range for clear charts.
Should I aggregate before uploading?
Section titled “Should I aggregate before uploading?”Yes. PressViz does not aggregate raw datasets for you, so preparing the data first gives you more control and a better result.
What if I want to show trends over five years?
Section titled “What if I want to show trends over five years?”Use monthly or quarterly values instead of daily rows. 60 monthly points are easier to understand than 1,825 daily ones.