> There was no database, but the best databases aren’t handed to reporters. They’re handmade. I opened a blank spreadsheet and started digging through years of old news clips and pageant websites. I tracked who was competing where, what titles they were winning, and how.
The more experienced I get in journalism and data work, the more convinced I am that the spreadsheet is the first and best tool for the kind of bespoke, flexible, and iterative data collection and modeling that journalists find themselves having to do -- i.e. building a dataset from scratch.
The Sheets Database functions (DGET, DSUM, DAVG, etc) are also really nice, especially after you get familiar with the format for inlining an array of conditions.
Excel has the nasty idea to modify what you type in the cell. If it seems date-like or number-like, it will convert the content, possibly corrupting interesting details in the process.
You can of course counteract this with a quote or with formatting, but you have to remember to do this consistently.
I understand why they do it for a spread sheet: It is optimized for calculating. But this tendency makes excel untrustworthy for general processing.
Sheets isn't consistently the better option, but it does shine for specific use cases. The three main reasons I'll reach for Sheets is:
- Heavy text munging. The builtin regex functions have no equivalent in Excel without dropping down into custom
- Quick and dirty data pulls and third party integrations. Especially any that involve Google services (since the auth process is less painful). I used to use both VBA and App Script pretty extensively, and could switch between the two without much issue. But I use neither as frequently as I used to, and it's significantly easier to ramp back up and ship when I use App Script rather than VBA. Excel supports Javascript now, but are only usable if you can ensure it's going to be used exclusively in newer versions of Excel.
- Sheets I create for others (especially shared by multiple users) that I know I'll need to maintain and support later on. The access controls, change auditing & revert capabilities, and standardized/centralized execution environment all remove entire classes of support needs and the associated cognitive overhead when triaging issues.
That said, there are certain times I prefer to use Excel.
- Pivot charts are amazing and have no equivalent in Sheets.
- Pivot tables are far more powerful than their Sheets counterpart.
- PowerBI is fantastic (except for the lack of Excel for Mac support. Which can still view the results of PowerBI, but can't do any editing).
- When connecting to internal data sources. Getting access to data sources, for business teams, is a royal pain in the ass. Legitimately so, since it's rare for a business team to have a resource with a technical enough skillset to truly be trusted with direct access to anything. Getting access that's reachable outside of the intranet (where App Script would run) is virtually impossible.
There's a gene called Septin-6, abbreviated Sept6. You can tell when gene expression data has been through an Excel cut/paste cycle, because Sept6 been converted to September 6. Oh Excel, you think you're so smart.
I bet there is a space there for something between R and Excel. Something where everything is "as-code" in a scm friendly format but the primary interaction could be equally powerful in a text editor or a cell space.
The "workbook" path of things like Jupityr is close.
I use Orgmode. A lot. It's my primary organisation method, as the General Manager for a small 3 person sub-company, as well as my household organisation.
I've always, unwittingly, organised myself via two systems:
1. Lists
2. Matrices
I love the tables in Orgmode [0]. They are intuitive, simple, and just beautifully done.
That said, the spreadsheet functionality is a different beast altogether. Even though it's the same interface as the tables, it adds an exponential layer of complexity, and separation from how most people view a spreadsheet. It's powerful, but a bit daunting, and I don't see many journalists being able to grok it.
I don't believe that's what the poster meant, which you're replying to. xlwings is a python library used to connect to excel files and e.g. change their contents programmatically. I don't think you can use it (out of the box anyway) to display and change data 'as-a-spreadsheet' in a jupyter notebook [e.g. by showing an iframe].
Yeah, unsure about what the parent may have intended. I use it as a data IO for stuff that I want to manipulate in Python somewhat-interactively, by having an Excel sheet open on the side as I work in Python.
Jupyter lab has a spreadsheet editor. I haven't actually used it so I'm not sure if you can calculate with it, but I'd be mildly surprised if you could.
The methods may be different (using a search engine instead of a card catalog), and the locales might have changed (using an online data store at your desk instead of going to a Library), but one thing that hasn't changed is that lots of information doesn't exist in structured form.
Mariel Padilla who was a student won a Pulitzer for her help in creating a database as an intern that was essential in keeping track of the 24/7 opioid crisis reporting that the Cincinnati Enquirer was doing.
I think of winning a Pulitzer in the non-individual categories like a film winning an Oscar in the non-individual categories. It is a grand achievement that was made possible by the efforts of many. A non-individual award at the Pulitzer level is still a remarkable achievement.
I don't know why, but I was kind of surprised when I got to the end. I was expecting more... some sort of conclusion. Other than that, yeah, use the best tools available and especially if you are getting data in an async manner, a cell-based direct edit tool like Excel is great.
I watch sports a lot. A well executed shot or play is a delight to watch. Can someone explain why Miss America or a beauty pageant is interesting in 2018?
Not to be rude, but I was expecting something more from this story than "I filled up a spreadsheet and talked to someone" :/ maybe I'm missing the point of this story?
No, you're right. That is literally the entire point of the story. People on Twitter showered the author with praise. I mean, filled out a spreadsheet and talked to people, as you said, but there wasn't really that much besides that.
In addition, the main story doesn't seem to include of her data.
The more experienced I get in journalism and data work, the more convinced I am that the spreadsheet is the first and best tool for the kind of bespoke, flexible, and iterative data collection and modeling that journalists find themselves having to do -- i.e. building a dataset from scratch.