Extract a thread as a text file

26 Replies, 1707 Views

(2021-07-27, 11:32 AM)Typoz Wrote: I also noticed a related bug, that the very latest post(s) may not be present across all of the formats. Possibly only the selected format is up-to-date, but it's a very minor issue.

Without checking in the code, my guess is that this and the problem of multiple files sometimes being present in the ZIP archive is an error with the archiving code - probably, it is re-using the temporary directory in which the zipped files are stored prior to zipping.

(2021-07-27, 11:32 AM)Typoz Wrote: I had a look at the csv file but when loaded into MS Excel, the date didn't seem to be interpreted as anything meaningful. Pretty sure that's just MS living inside its own bubble though. Edit: OK, I found how to convert the date to Excel.
Code:
=(C2/86400)+DATE(1970,1,1)
where C2 is the cell containing the Unix date.

Ah, thanks for pointing that out. Yes, it should be both a human-readable as well as a spreadsheet-parsable date, rather than something (a UNIX timestamp) that both are going to stumble over.

(2021-07-27, 04:34 PM)David001 Wrote: Since you are (one of) the authors of the software, Do you think it would be worth adding a plain text option to the possible formats.

Yes. I am thinking of moving its functionality into my Extract Posts plugin anyway, since they do similar jobs, so it seems pointless to have two separate plugins. It would also be helpful to add the ability to select multiple threads or even entire forums to download. If/when I do that, it will be easy to add plain text as a format option given that Extract Posts already supports it. I'll let you know if/when I get that done.
(This post was last modified: 2021-07-27, 11:20 PM by Laird.)
[-] The following 1 user Likes Laird's post:
  • Typoz
I was just wondering, when the idea of possibly downloading entire forums was mentioned, where one should draw a line between ordinary user functionality, and privileged administrator functionality. Or should there be such a boundary?
[-] The following 1 user Likes Typoz's post:
  • Laird
(2021-07-28, 01:49 AM)Typoz Wrote: I was just wondering, when the idea of possibly downloading entire forums was mentioned, where one should draw a line between ordinary user functionality, and privileged administrator functionality. Or should there be such a boundary?

That's a good question. Perhaps the plugin could support per-usergroup permissions to (1) stipulate which usergroups could download entire forums, and (2) stipulate a maximum number of simultaneous thread downloads for each member of the usergroup.

In any case, I ended up fixing up all of the mentioned glitches plus adding support for plain text downloads to the download threads plugin, so feel free you guys to try it out some more and see what you think.
[-] The following 1 user Likes Laird's post:
  • Typoz
(2021-07-28, 03:26 AM)Laird Wrote: That's a good question. Perhaps the plugin could support per-usergroup permissions to (1) stipulate which usergroups could download entire forums, and (2) stipulate a maximum number of simultaneous thread downloads for each member of the usergroup.

In any case, I ended up fixing up all of the mentioned glitches plus adding support for plain text downloads to the download threads plugin, so feel free you guys to try it out some more and see what you think.
Yes, I don't have a strong opinion on the first part - a question I raised, but some part of me was sounding alarm bells about something not seeming quite right - possibly, not sure.

On the fixes you did, thank you, yes it seems to work reliably enough, though I only did a small example to try it.

Also I didn't really pay attention to the "View a Printable Version" option, which could also be useful except for very long threads with many pages.
(2021-07-28, 03:26 AM)Laird Wrote: That's a good question. Perhaps the plugin could support per-usergroup permissions to (1) stipulate which usergroups could download entire forums, and (2) stipulate a maximum number of simultaneous thread downloads for each member of the usergroup.
Surely the obvious rule would be that a user would be permitted to download whatever he could read in the normal way?
Laird,

I have just tested your download feature on this thread. It is fine except that it could be compacted a bit for the benefit of Kindle users. The following keeps on appearing

------------------------------------------------------------------------

Username

The dashed line overflows the line length on the Kindle, so it occupies two lines. It would be nice to at least shorten it, or maybe remove it and maybe just append something to the end of the previous post.

Obviously I could write a program to do all that, but it would be nice to get it all in one step.
(2021-07-29, 09:21 PM)David001 Wrote: Surely the obvious rule would be that a user would be permitted to download whatever he could read in the normal way?

Yes, although I can think of a couple of objections to that:

Firstly, it could be abused to waste bandwidth and server resources: if getting the server to extract all threads in a single forum from the DB and stream them to the client was as easy as clicking a single button, then potentially tens or even hundreds of megabytes of bandwidth could be wasted "just like that". An admin might, then, want to limit this feature to members in certain trusted groups, such as other admins.

Secondly, some admins might like to be able to configure various forums as non-downloadable for various other reasons, such as (1) they are private, containing exclusive content, and the admin wants to make it as hard as possible for the content to be shared elsewhere, (2) they want forum downloadability to be a premium feature for which members have to pay, after which they are added to a usergroup with downloading permission.

(2021-07-29, 09:44 PM)David001 Wrote: Laird,

I have just tested your download feature on this thread. It is fine except that it could be compacted a bit for the benefit of Kindle users. The following keeps on appearing

------------------------------------------------------------------------

Username

The dashed line overflows the line length on the Kindle, so it occupies two lines. It would be nice to at least shorten it, or maybe remove it and maybe just append something to the end of the previous post.

Obviously I could write a program to do all that, but it would be nice to get it all in one step.

Thanks for the feedback. I've limited the dashes to a sequence of 20. Please let me know if that's still too long, or of any preferable approach to indicating the boundaries between posts.
Laird,

Thanks for that. Yesterday, I downloaded the bugs in evolution thread as text, and manually edited the dashes. The result was much nicer, but then I realised that there is a remaining problem in that the same text pops up over and over because people quote each other. On the forum, you can skip over the quotes because of their darker colour, but even then all the quotes can be very intrusive.

Obviously  quotes of material outside the thread should stay in, but an option to condense intra-thread quotes would be very powerful. I suggest that there would be an option so that very short quotes could remain, but longer ones would be collapsed to some thing like

<Quote from David001 12:15 01-01-2021>

In that way, anyone interested in the quote could find it easily with a search, but the reader wouldn't get lost in repetitions.

This might also be useful in the other options (CSV and JSON) but I can't imagine why anyone would want to download text in CSV format, and to be honest, I haven't a clue what JSON is for.
(This post was last modified: 2021-07-30, 09:09 AM by David001.)
Hi David,

The quote collapsing option sounds like a helpful idea (for plain text) but - with no offence intended - implementing it is low on my list of priorities. I'd accept a pull request if anybody else wanted to implement it though. For anybody interested, just be aware that the latest changes haven't yet been pushed to the primary dragonexpert repository. I'll do that soon and release a new version of the plugin to the MyBB Extend site if nobody reports any bugs in the near future.

(2021-07-30, 09:06 AM)David001 Wrote: I can't imagine why anyone would want to download text in CSV format

Oh, that format is useful for importing the data into a spreadsheet.

(2021-07-30, 09:06 AM)David001 Wrote: to be honest, I haven't a clue what JSON is for.

Mostly for automated processing. It's essentially a subset of Javascript for specifying structured data. [Edit: or, in other words, it's a lightweight alternative to XML, with which you might be more familiar]
(This post was last modified: 2021-07-30, 09:34 AM by Laird.)
Thanks for the time and effort put in to making this work, and tweaking/ debugging. I'm grateful for that.

Returning to the csv format, I previously mentioned the date in the form of a timestamp being somewhat unwieldy. Actually on second thoughts I actually like the timestamp, machine-readable formats do have advantages for such things as sorting. One use I was thinking of was for example to sort the csv by username and then by date-time. The human-readable date is less useful for that. Maybe an additional column to output both the human-readable date in one column, and either the raw timestamp or possibly a date in the form similar to yyyy/mm/dd hh:mm:ss. Since I may be the only person who ever uses this, I'd make this a low priority, consider it a luxury rather than necessary feature.


Regarding security or privileges needed, probably it should as a minimum require a user to be logged in? At present, "Download Thread" seems to be available without logging in.

  • View a Printable Version


Users browsing this thread: 1 Guest(s)