Some time recently, the Confluence Cloud Macro usage search stopped working. The link to search for all the pages using a macro does not find any pages. We are in the middle of a major content restructure so this is very inconvenient.
We needed a fresh list of all the Confluence Cloud pages that included other pages, and the names of the included pages. We could obtain the list manually by going to each page, loading the Page information, and getting the Outgoing links section. But we would have to exclude ordinary links to other pages from the outgoing links. It would have taken us a very long time to process hundreds of pages, so I decided to improvise a Macro usage search.
To improvise the Macro usage search I used a Python script and a regex. I'm sure it would be possible to find flaws in this process but it gave us a list of 130 pages with their included pages to get us working again!
I used a version of a Python script by Sarah Maddox for Confluence full text search which is described on the ffeathers blog. I had previously updated this script to use the Atlassian Python API (Confluence module) and to work with Confluence Cloud.
Using the script, I got the content of all pages in Confluence Storage Format, which is a kind of XHTML or XML. The script creates a file for each page with a link to the page in the first line, and the Storage Format after that. The link in the first line is a great idea because it saves a lot of time being able to click through to the page itself.
Then I quickly created a regex to search for the include macro. The regex replaces the preceding text with the name of the included page on a single line. I ran the regex on all the page files. Finally, I concatenated the modified files into a single file and uploaded it to Google docs for the Documentation team to work from. Mission completed!
In Confluence Storage Format, the include macro looks like this:
<ac:structured-macro ac:name="include"
And the page name section of the macro looks like this:
<ri:page ri:content-title="Introduction to catalogue"
The Perl command to replace the text before the macros and the macros themselves (up to the page name) with the name of each included page (and a newline) looked like this:
find . -type f -print0 | xargs -0 -n 1 perl -pi -e 's/.*?<ac:structured-macro ac:name="include.*?ri:content-title="(.*?)"/$1\n/g'
The idea to use Perl instead of sed came from Stack Overflow.
To remove any trailing text after the last include macro, I used this command, which was straight from Stack Overflow:
find . -type f -print0 | xargs -0 -n 1 sed -i '\n' -e '$ d'
Finally, here is an example of the page output of a page and the included topic pages:
**https://abiquo.atlassian.net/wiki/spaces/doc/pages/311371369/Catalogue+view**
Introduction to catalogue
Catalogue concepts
Catalogue symbols
VM template states