Page for requesting database queries
|
|
|
|
|
|
This page has archives. Sections older than 14 days may be auto-archived by Lowercase sigmabot III if there are more than 4. |
This is a page for requesting one-off database queries for certain criteria. Users who are interested and able to perform SQL queries on the projects can provide results from the Quarry website.
You may also be interested in the following:
- If you are interested in writing SQL queries or helping out here, visit our tips page.
- If you need to obtain a list of pages that meet certain criteria, consider using PetScan (user manual) or the default search. PetScan can generate lists of articles in subcategories, articles which transclude some template, etc.
- If you need to make changes to a number of articles based on a particular query, you can post to the bot requests page, depending on how many changes are needed.
- For long-term review and checking, database reports are available.
The database replicas do not have access to page content, so requests which require checking wikitext cannot be answered with database queries. In particular, there is no way to query for template parameters or anything related to references. However, someone may be able to assist by querying in another way (for example, checking for external links rather than references) or suggest an alternative tool.
Is it possible to generate a query for blocks / recent (most recent 100?) blocks that mention the word "copyright" in the block log?
(In case it's relevant, I would like to see any copyright blocks that don't get reported at the noticeboards for cleanup purposes and do the fancy thing some people do and put it in {{Database report}}... so like if somebody wants to say why that's an awful idea please feel free.) GreenLipstickLesbian💌🧸 10:20, 28 November 2025 (UTC)
- Here are the most recent 100: quarry:query/99310. The IP addresses unfortunately aren't in a very readable form but I've included them for completeness where they exist. You can change the number fetched by tweaking line 7. J11csd (talk) 12:36, 28 November 2025 (UTC)
- The human-readable ip is in bt_address. —Cryptic 12:47, 28 November 2025 (UTC)
- Oh, so it is! I glanced right over it :-S
- Query has been updated to use this field and also to do case-insensitivity (somewhat more) correctly. J11csd (talk) 12:57, 28 November 2025 (UTC)
- Some more thoughts:
- The explicit collation is unnecessary; converting to utf8 is enough.
- Probably want to find reasons matching %copyvio% too.
- It's not clear that the block table is the right tool for this; it'll omit expired ones. logging with log_type='block' and log_action!='unblock' will include those. Whether they should be included or not isn't clear. (On the other hand, extracting expiry, whether the block is partial, etc. from log_params is fraught; on yet a third hand, none of that's included in the query anyway.)
- This can probably be made smarter by omitting usernames/ips that already have a matching Wikipedia:Contributor copyright investigations/ subpage linked from Wikipedia:Contributor copyright investigations/Bottom. That'll miss the numbered ones like /20111108 (I take it those are for usernames that match real names? The ones I glanced at were.), but nothing to be done for those.
- —Cryptic 13:03, 28 November 2025 (UTC)
- Rather embarrassingly, I just realized I forgot to respond to this : thank you @J11csd and @Cryptic, this is exactly what I was dreaming of! Best early xmas present ever!
- And re: Cryptic's last point- yes, you guessed correctly, numbered CCIs are primarily for real or real-sounding names, to avoid accidentally harming the real person if somebody Googles them. The number is actually the date the CCI was opened, which, rather unfortunately, is not always the same day as the block. GreenLipstickLesbian💌🧸 11:47, 4 December 2025 (UTC)
If there's any way to get a list of every redlink on the site (ideally sortable by how many times the same redlink pops up), I'd be eternally grateful. I've been trying to make Wikipedia:Requested articles into something more useful for the average newbie (actually checking to see if there's a good chance the aforementioned subjects are notable or at the very least not spam) and I think having this information could compliment that goal very well. Clovermoss🍀 (talk) 04:40, 2 December 2025 (UTC)
- @Clovermoss: This sounds like Wikipedia:Most-wanted articles. — DVRTed (Talk) 04:58, 2 December 2025 (UTC)
- Not quite. I want every red link. Clovermoss🍀 (talk) 05:10, 2 December 2025 (UTC)
- By "every", do you mean not just from mainspace to mainspace, or do you mean without WP:MWA's limit of being linked from at least 500 different pages? If the former, Special:Wantedpages does that; if the latter, there are a lot of them - about eight and a half million different redlinks - most with only a couple pages linking to each. —Cryptic 06:40, 2 December 2025 (UTC)
- The latter. Eight and a half million is way more than I was expecting, wow. Is it even possible to do a query that large? Organizing this is going to take a lot longer than I thought. Clovermoss🍀 (talk) 06:47, 2 December 2025 (UTC)
- It is, but not to display it on Quarry. I could email it to you, I suppose. It's much more manageable if you cut off the very bottom - there's about 6 million redlinks linked from only one article, another million from only 2, and 850 thousand from between 3 and 9. —Cryptic 07:02, 2 December 2025 (UTC)
- Okay, email me all the redlinks featured in more than two articles. I can worry about the rest a decade from now 😂. Clovermoss🍀 (talk) 07:05, 2 December 2025 (UTC)
- You've got mail. Grumble. Or I suppose I can paste it into a bunch of sandboxes. —Cryptic 08:14, 2 December 2025 (UTC)
- I've replied. Thank you for doing this. I really appreciate it. Clovermoss🍀 (talk) 08:25, 2 December 2025 (UTC)
- And re-sent. Remember, you inflicted this on yourself! —Cryptic 08:28, 2 December 2025 (UTC)
Hello, I am trying to answer a research question that requires finding a specific public log entry. Could someone please run a query to find all public log entries (from any log type) that meet the following criteria?
- **Date Range:** From January 1, 2022, to November 30, 2022.
- **Keyword:** The log summary, comment, or description must contain the keyword "Legume".
The target of the log entry is not necessarily the "Legume" page itself. I need the full details of any matching log entry, including the date, time, user, action, target, and the full summary text.
Thank you for your help! ~2025-38095-93 (talk) 00:09, 3 December 2025 (UTC)
- Here's 5. It might be case sensitive, so if you need something besides "Legume", such as "legume", let me know. I'm also not sure if the relationship between comment_id and log_comment_id is 1:1 or not, so that could be a possible source of bugs / missed log entries. –Novem Linguae (talk) 04:14, 3 December 2025 (UTC)
- comment rows are reusable - hence comment_hash - and you'll see that especially with a few like comment_id 10 (the empty string), or for where e.g. in a move, the same comment is reused in the dummy edits to revision on the moved-from and moved-to pages. In practice, though, I'm amazed that there's enough storage saved by reuse to justify the storage used by the hash column and index. Either way, it wouldn't matter for that query. —Cryptic 11:00, 3 December 2025 (UTC)

Resolved
I am looking for a list of page titles which exist in both the Help and Wikipedia namespaces, where neither is a redirect. This arises out of this recent merge proposal regarding Wikipedia:Translation and Help:Translation, with the discussion turning partly on what would be appropriate to each page if they were not merged. I later realized there are other such pairs, such as Wikipedia:Substitution and Help:Substitution where the same question could be asked, and I wondered if there aren't many such WP-Help pairs, where it would be profitable to have a higher level discussion at VPI or somewhere about such twinned pages. Hence, this request.
As for SELECTion columns, page title (without namespace) in col 1, I would say, and then some paired columns with adjacent 'Help' and 'WP' stats, perhaps something like this:
- Title, Age-H, Age-W, Watchers-H, Watchers-WP, PgViews30days-H, PgVw30-WP, UniqueUsers-H, UqUsers-W, TotEdits-H, TotEdits-W
or whatever seems useful and not a pain in the neck to produce.
For the WHERE, I thought of exclusions for redirects, but probably there could be others that might be helpful, maybe non-disambig page (Template:Disambiguation), non-stub (Template:Stub), non-essay (Template:Essay), non-set index article (Template:Set index) whatever seems good here.
The tricky part, it seems to me, is how to sort them. It should be something that elicits those pairs that are 'most in need of community attention' at the top, admittedly a very wishy-washy description. Some proxies that come to mind for that are avg page views, avg page watchers, avg total number of unique editors, avg age; and do we take the max instead of avg, or do we just pick stats based on one NS? Maybe try a few sorts, and see how they compare. Anything that elicits 'Translation' and 'Substitution' towards the top are probably good; then again, I don't know what the competition is. Thanks! Mathglot (talk) 20:59, 12 December 2025 (UTC)
- Note to self: for VPI, html-ize, add cols for merge/Rfcs, remarks, etc. Mathglot (talk) 21:14, 12 December 2025 (UTC)
- The watchlist table isn't in the public replicas for privacy reasons, so we don't have access to watchers at all - the only way to get them is to go to Page Information onwiki for each of them, one by one. And page views isn't in the main database at all; the least painful way is to paste the results onto a page onwiki, feed that into https://pageviews.wmcloud.org/massviews/, and manually paste the results of that back onto the page. But the rest I can do. There's only a few. quarry:query/99985. Do you also want ones where one or the other (or both) is a redirect, but points somewhere besides the corresponding page in the other namespace, and not both them pointing at the same third page. There's 319 of those. —Cryptic 02:16, 13 December 2025 (UTC)
- Oh, this is great, thanks! And I am relieved that there aren't a lot more (although this is enough to provide great data for a VPI discussion, and keep us busy afterward). Thank you for the tip about massviews; with output limited to a few dozen, that sounds worth doing once I wikitable-ize it. Your refinement suggestion regarding redirects sounds like a good idea. In that case, do you envision the two pagename columns becoming four, or what would happen? I'm just thinking ahead to the sortable wikitableized version, and making sure we have a column we can sort on to push all the redirects to top or bottom. If you need a test item to peruse, I know that there is a Help:Table, and that WP:TableWP:Table is a redirect to a set index page. If there's 319, can we have that as a separate quarry id, so I can link the one you already have, and that one, too? Mathglot (talk) 02:30, 13 December 2025 (UTC)
Oh, I was suspicious of the rows with only 1 editor, and saw that Wikipedia:Content model for example is a soft redirect. Is it easy enough to exclude those? Otherwise, the '1 editor' is a good enough flag to just ignore them, or I can do it when I reprocess the file. Mathglot (talk) 02:35, 13 December 2025 (UTC)
- Actually, never mind that. VPI readers may want to know about those. Mathglot (talk) 02:37, 13 December 2025 (UTC)
- quarry:query/99986 for the mismatched redirects. Some of them are simply a product of the previous query (WP:Visual editor and Help:Visual editor point to WP:VisualEditor and Help:VisualEditor respectively); some are distinct cases like Help:National varieties of English being a sort-of-a-disambig while Wikipedia:National varieties of English redirects to Wikipedia:Manual of Style#National varieties of English.Soft redirects aren't redirects so far as the software's concerned, just pages with a template on them; I could detect them by looking for transclusions of {{soft redirect}}, but wouldn't be able to tell what they soft-redirected to, like I can with normal redirs. —Cryptic 02:44, 13 December 2025 (UTC)
- Super useful; thanks for your efforts. Both of these should stimulate some interesting discussion. I've tagged this discussion as 'resolved' at the top. Much appreciated! Mathglot (talk) 03:05, 13 December 2025 (UTC)
For those who are curious/interested in following this further, see Wikipedia:Namespace/Help vs. Wikipedia. Mathglot (talk) 09:15, 13 December 2025 (UTC)