Extract a thread as a text file

26 Replies, 2428 Views

(2021-07-30, 09:29 AM)Laird Wrote: Hi David,

The quote collapsing option sounds like a helpful idea (for plain text) but - with no offence intended - implementing it is low on my list of priorities.
I think I'm right in saying that the contents of a quote are bracketed by lines with two dashes - if that is consistent, I could just write a tool for myself that would do the job. Sorry, I don't really want to plunge into someone else's code (probably in a language I don't know) right now.
Quote:Oh, that format is useful for importing the data into a spreadsheet.
But I thought we were only talking about text.
Quote:Mostly for automated processing. It's essentially a subset of Javascript for specifying structured data. [Edit: or, in other words, it's a lightweight alternative to XML, with which you might be more familiar]
Yes, I'm familiar with XML.
(This post was last modified: 2021-07-31, 08:47 PM by David001.)
(2021-07-31, 08:45 PM)David001 Wrote: I think I'm right in saying that the contents of a quote are bracketed by lines with two dashes - if that is consistent, I could just write a tool for myself that would do the job.

Yes, exactly. I was thinking a simple regular expression-based replacement would work just fine, especially when you factor in that prior to the first two dashes there's a line:

[Username] Wrote: (yyyy-mm-dd, hh:mm)

That should be pretty easy to match with a regex - and, as you say, you can then match from the first two dashes up until the second two dashes.

(2021-07-31, 08:45 PM)David001 Wrote: But I thought we were only talking about text.

CSV is text, but it's also formatted text ("comma-separated values"). That format is especially suited for import into spreadsheets. Try it for yourself. Wink
OK I tried simply removing text between lines consisting of "--".

The result wasn't very satisfactory, because people would quote stuff from sources outside the thread, and then discuss it.

I might restrict the mechanism to remove only text belonging to the thread, but I wonder if even that would actually be useful.

David
(2021-08-03, 08:37 PM)David001 Wrote: OK I tried simply removing text between lines consisting of "--".

The result wasn't very satisfactory, because people would quote stuff from sources outside the thread, and then discuss it.

Right, good point. That had occurred to me too earlier, but then slipped my mind.

(2021-08-03, 08:37 PM)David001 Wrote: I might restrict the mechanism to remove only text belonging to the thread, but I wonder if even that would actually be useful.

Mmm, sounds like a lot of work just to avoid a few mouse clicks or PgDn presses to scroll past a few quotes....

Edit: Also, sometimes quotes are necessary for context, so you know what somebody's responding to - as for quotes of material outside the thread, you wouldn't want to remove them, but unlike external quotes, there's no way in advance to know which ones they are.
(This post was last modified: 2021-08-04, 03:08 AM by Laird.)
(2021-08-04, 03:07 AM)Laird Wrote: Right, good point. That had occurred to me too earlier, but then slipped my mind.

Mmm, sounds like a lot of work just to avoid a few mouse clicks or PgDn presses to scroll past a few quotes....
Well I think it is more than that - if you look at the Darwin unhinged thread, particularly if you go back a bit, rather than but view the latest stuff, I find it easy to get lose track of who is saying what.
Quote:Edit: Also, sometimes quotes are necessary for context, so you know what somebody's responding to - as for quotes of material outside the thread, you wouldn't want to remove them, but unlike external quotes, there's no way in advance to know which ones they are.

Well of course there are always compromises here, for example when you quote in some forums the quote includes embedded quotes. That gives you maximum context at the expense of less readability.

In any case, I anticipate that processing of this sort would be optional.

I am still thinking about it.
[-] The following 1 user Likes David001's post:
  • Laird
@Laird

I'd like to make more use of the option to download a whole thread. I think this used to work better before one of the forum upgrades that made heavy use of HTML tables.

As it stands these downloads are not much use of my Kindle because the tabular format spills over the right hand margin hopelessly (unless perhaps I selected an impossibly small font). I think this can be achieved by adding some sort of option to remove or disable all the tables. One way that seems to work is to replace <td with <xx.

What do you think?

David
(2024-02-21, 05:59 PM)David001 Wrote: @Laird

I'd like to make more use of the option to download a whole thread. I think this used to work better before one of the forum upgrades that made heavy use of HTML tables.

As it stands these downloads are not much use of my Kindle because the tabular format spills over the right hand margin hopelessly (unless perhaps I selected an impossibly small font). I think this can be achieved by adding some sort of option to remove or disable all the tables. One way that seems to work is to replace <td with <xx.

What do you think?

David

Hi David,

I've moved your post to the appropriate thread.

No changes have been made to the plugin to make additional use of tables, so I'm not sure why you're suddenly experiencing difficulties. I have tested on my mobile phone and have not experienced any readability difficulties, but they might depend on the content of the thread in question. If you could share the thread you were having these difficulties with, that would be helpful. In the meantime, try downloading as plain text rather than HTML and let me know whether that's adequate for your purposes.
(This post was last modified: 2024-02-22, 04:13 AM by Laird. Edited 1 time in total.)

  • View a Printable Version
Forum Jump:


Users browsing this thread: 1 Guest(s)