Annual Statistics Reports

This page summarizes the statistics that are collected for annual reports. It describes the methods used to collect those statistics, so they can be collected in the same manner each year.

A note on dates
Although some of the stats use the word "submitted", we use the Date Accessioned to calculate all stats. This is the date at which an item entered the archive. For an item that is submitted to Dryad in late 2011, but not fully archived until early 2012, the statistics will count for 2012. This allows us to calculate consistent statistics without dealing with the complexities that happen before an item is archived, which include the review process and publication blackout.

Basic submission stats
Run the datapackagestats report and process the results.


 * Packages submitted in the year
 * Number of items that have a dateaccessioned in 2012
 * Total packages as of year end
 * Number of items that have a dateaccessioned in 2012 or earlier
 * Files submitted in the year
 * For the items that have a dateaccessioned in 2012, the total number of their data files
 * Total files as of year end
 * For the items that have a dateaccessioned in 2012 or earlier, the total number of their data files
 * Volume of data (GB) submitted in the year
 * For the items that have a dateaccessioned in 2012, the total size of their data files
 * Total volume of data at year end
 * For the items that have a dateaccessioned in 2012 or earlier, the total size of their data files
 * Proportion of submissions that are:
 * integrated
 * proportion for which there is a manuscript number
 * pre-review
 * in the dryadassistant account, set up a query to find email notifications of review. "received for review" and appropriate dates.
 * This isn't very accurate because:
 * This counts the times when authors create multiple submissions for a single item.
 * The review number includes all items submitted to review purposes, but many of these may have their articles rejected.
 * The total submissions number includes only those submissions that end up in the archive.
 * With our current counting methods, we are not able to track a single submission through the process. That is, the review items are items submitted to the review stage in the year, while the archived items are items archived in the year. There are many items that enter the review stage in one year and are archived in a different year.
 * post-review (opposite of pre-review)
 * on with author lists differing between article and data (difficult to get w/o ORCID)
 * Proportion of files submitted this year that are:
 * embargoed (can it be limited to embargo-option journals?)
 * from dataPackageStats report, create a pivot table that uses the embargo settings as the row (and values) and uses the journalAllowsEmbargo as columns
 * alternate, but not normally used: run fileSimpleStats report for a count of embargo settings on data files)
 * Readmes
 * in datapackagestats, sort by date, then sum number of readmes and divide by the number of files
 * new versions -- while versioning system is being finalized, we just list how many are in waiting
 * each file type
 * run profileFormats report

Authors
select distinct text_value from metadatavalue where metadata_field_id=3 and item_id in (select item_id from collection2item where collection_id=2 and item_id in (select item_id from metadatavalue where metadata_field_id =11 and text_value > '2012-01-00' and text_value < '2013-01-00')); select distinct text_value from metadatavalue where metadata_field_id=3 and item_id in (select item_id from collection2item where collection_id=2); select count(*) from eperson;
 * Number of authors associated with submissions this year
 * Total number of authors represented in Dryad
 * see homepage, OR
 * Accounts created this year
 * can search email for subject "Registration Notification", but these messages aren't always saved.
 * Total number of accounts in Dryad
 * Distribution of packages by author over all yrs (list of top authors)
 * perform an empty search, and look at the author facet

Website Usage
curl "http://DRYAD_SERVER/solr/statistics/select/?q=-isBot:true+owningItem:%5B*%20TO%20*%5D&fl=time&rows=10000000" > downloads.txt grep -o "2012-" downloads.txt | wc curl "http://DRYAD_SERVER/solr/statistics/select/?q=-isBot:true+owningColl:2%20-owningItem:%5B*%20TO%20*%5D&fl=time&rows=100000000" > packageViews.txt grep -o "2012-" packageViews.txt | wc
 * Data file downloads
 * get the full stats:
 * pull out the relevant download timestamps and count them:
 * Data package views
 * get the full stats:
 * pull out the relevant download timestamps and count them:
 * Top downloaded packages
 * sort the dataPackageStats report by the "downloads" column
 * web sessions, per month over time
 * In Google Analytics, set the reporting time to the correct timespan (Jan 1 - Dec 31)
 * read the values for sessions
 * web sessions, broken down by country, language
 * In Google Analytics, set the reporting time to the correct timespan (Jan 1 - Dec 31)
 * Look in the "Location" section
 * sources of visitors
 * create a report in Google Analytics

Social media

 * Blog visits
 * Twitter followers

Journals

 * Currently integrated, in process, and on hold
 * see Submission Integration: Current Status and the Trello journal integration board
 * Breakdown of submissions by journal
 * run the DataPackagesPerJournal report (but first update the date window and recompile)
 * OR from the dataPackageStats report, create a pivot table -- use the journal names for both the rows of the table and the values area

Curation stats

 * Curators can provide stats on their time usage.