New "duplicate link warner" plugin [renamed to "Link Tools"]

8 Replies, 1903 Views

A while back, I noticed that with threads on our forum being frequently started to share links to interesting content, fairly regularly the same link would be used more than once to start threads due to the starter of the later thread(s) not being aware of the original thread.

I figured that it might be helpful to have a plugin that tried to warn members when they were about to start a thread which would duplicate a link in an existing thread (even if the link in the preexisting thread was not in its opening post but in a follow-up post). So, because I couldn't find an existing plugin which did that... I wrote one.

I've tentatively named it "Duplicate link warner". [Edit: since adding a bunch of extra functionality to it, most notably link previews, I've renamed it "Link Tools" and released it publicly here.]

For months now it's been available only to Doug and myself for testing, and for some time I haven't put any work into it. Today, I fixed up a few niggling issues and I've decided to finally make it available to everybody. Be warned though: it is far from perfect.

I don't promise to make any changes/fixes (though I very well might make them if I have the motivation), or even to publish its code, but I definitely do welcome feedback, and at the least, if folks generally find it irritating or otherwise don't like it, I will disable it again.

Basically, the way it works is that if Javascript is enabled, then, when you are drafting a new thread in the editor, if you enter into your draft a URL that is already present in any existing post on the forum, you will be warned about those existing posts. The warning takes the form of a flashing summary box in the centre of the window which moves slowly up to the top of the screen - it has buttons which allow you to either expand the summary to view more information about the posts in question or to dismiss all of the warnings.

If Javascript is not enabled, or if you don't dismiss all of the warnings, then when you click "Post Thread", the warnings - in expanded form - will instead be displayed prior to the thread being posted, and you will have to click an "Ignore warning and post anyway" button if you want to post the thread despite the warnings.

Note that warnings are only given when posting new threads. The plugin does not operate for replies to existing threads.

Any questions, please feel free to ask. Like I said, it's not perfect, and no doubt the user interface could be improved - your suggestions are very welcome, though like I also said, I don't promise to implement them even if they're very worthwhile.

Any comments, please feel free to make them.

Cheers guys.
(This post was last modified: 2021-04-20, 01:12 PM by Laird.)
[-] The following 6 users Like Laird's post:
  • Kamarling, Ninshub, Doug, Stan Woolley, Valmar, Typoz
(2018-12-30, 12:15 PM)Laird Wrote: Be warned though: it is far from perfect.

A case in point: Tom Butler, in his recent thread, Denialism: what drives people to reject the truth, links to a Guardian article which Stan Woolley had linked to a couple of days earlier in his thread, Denialism - Guardian article.

Ideally, this plugin would have caught and warned about the duplicate link before Tom posted his thread. So, why didn't it? Because, currently, the plugin regards links as matches only if they are absolutely identical. In this case, they are not - each uses different query parameters (the bit of the link after the "?" symbol and before any "#" symbol):
  • Tom's link uses the query parameter fbclid=IwAR3AnzPGeyfZ25zBEKcUep0uAR_f5rj8fFoh7MkcisRhgbsGYSGBHqQyNi4, whereas
  • Stan's link uses the query parameter CMP=share_btn_link instead.
(Both parameters are, in this case, optional).

Closely related to (in a way, deriving from) this imperfection is the problem that a link which redirects to another link will not be recognised as a match with that other link, and vice versa. For example, the shortcut YouTube link, https://youtu.be/jc3waP7syjk leads via a redirect to the canonical page/video at https://www.youtube.com/watch?v=jc3waP7syjk (note in particular that the video code "jc3waP7syjk" is identical in both links) but the plugin does not currently consider these two links to be matches.

(This post was last modified: 2019-07-13, 10:17 PM by Laird.)
[-] The following 4 users Like Laird's post:
  • Typoz, Sciborg_S_Patel, Ninshub, Doug
That damned fbclid parameter. It's facebook of course, adding some tagging of its own in places where it has no right to be. Others have reported that it has side effects, such as breaking normal caching behaviour, and causing extra bandwidth use at both user and server ends. Some links may be broken either by that or some other reason related to this unwanted tag.

I only spotted it myself a few days ago, from now on I will manually remove it where necessary, but we can't expect all users to do so. At any rate, it is creating extra work for many people in dealing with this casual and careless disregard by fb.
[-] The following 4 users Like Typoz's post:
  • Ninshub, tim, Laird, Doug
I've updated the plugin to handle these scenarios, i.e., redundant query parameters when matching URLs (links) as well as when one URL redirects to another, preventing the two from matching. The redundant query parameters are currently handled by using a list of "offending" query parameters to strip from URLs, and the scenario of one URL redirecting to another is handled by resolving redirects when (or after) a post containing URLs is posted/edited, and storing all URLs from all posts in a couple of custom database tables. [ETA: This approach isn't 100% foolproof because we don't first resolve any redirects of the URLs (links) in the draft post before comparing with those in the database, but because we normalise all URLs before comparison, it catches most matches. Mostly, normalisation involves erasing the distinction between 'http://' and 'https://', stripping any 'www.' prefix from the domains, lower-casing the domains, stripping redundant query parameters, and ordering the remaining query parameters alphabetically].

Here's where I'd like your input: currently, the resolving occurs as the post is posted, so when posting content containing URLs (links), there is now a delay of typically a few, say, ten, seconds before the page returns after posting, but if the post contains a lot of links to the same server, then the delay can be up to 60 seconds. Is this delay acceptable from the perspective of a member making a post? Or should I try to find a better solution (that is as immediate as possible)?

Also a little note: in the course of making these changes, I didn't update the client-side Javascript to be totally compatible with them, and haven't yet completed that task, so there can be from time to time some glitches there, especially when adding in, deleting, and then adding back in a link to your draft.
(This post was last modified: 2019-03-02, 08:45 AM by Laird.)
[-] The following 4 users Like Laird's post:
  • Sciborg_S_Patel, tim, Typoz, Doug
First, I'll say well done, that seems a fairly big chunk of work and thought has gone into this.

Regarding possible delay, maybe display some sort of alert or temporary message stating that this process is taking place. Maybe that adds extra work and delay too?
[-] The following 2 users Like Typoz's post:
  • tim, Laird
I'll look into that possibility, Typoz - thanks for suggesting it and for your kind words. This plugin is a work in progress and yes quite labour intensive, but I'm hoping that eventually it will be at a point where it can be released to the MyBB community, which will make the effort worthwhile as it will spread the benefits.
[-] The following 1 user Likes Laird's post:
  • Typoz
Another example to consider, posted in these threads,

Tim:
https://psiencequest.net/forums/thread-n...2#pid26892

Kamarling:
https://psiencequest.net/forums/thread-n...2#pid25772

The URLs

https://goop.com/wellness/spirituality/1...-us-dying/

https://goop.com/wellness/spirituality/1...ZETtbY2EdM
[-] The following 2 users Like Typoz's post:
  • Laird, Doug
(2019-03-17, 07:24 AM)Typoz Wrote: Another example to consider, posted in these threads,

Tim:
https://psiencequest.net/forums/thread-n...2#pid26892

Kamarling:
https://psiencequest.net/forums/thread-n...2#pid25772

The URLs

https://goop.com/wellness/spirituality/1...-us-dying/

https://goop.com/wellness/spirituality/1...ZETtbY2EdM

Super, thanks. I've added those utm_* query parameters to the ignored list. It now looks like this, where the arrows indicate that the parameter is ignored only for the domain(s) to the right of the arrow:

'fbclid',
'feature=youtu.be',
't' => 'youtube.com',
'time_continue' => 'youtube.com',
'CMP',
'utm_medium',
'utm_source',
'utm_campaign',
'utm_term',
'utm_content',
'akid',
'email_work_card',

If anybody has any others to suggest, please go right ahead.
[-] The following 1 user Likes Laird's post:
  • Typoz
(2019-03-02, 07:12 AM)Laird Wrote: This approach isn't 100% foolproof because we don't first resolve any redirects of the URLs (links) in the draft post before comparing with those in the database

We do now.

(2019-03-02, 07:12 AM)Laird Wrote: Also a little note: in the course of making these changes, I didn't update the client-side Javascript to be totally compatible with them, and haven't yet completed that task

Now completed.

Am slowly making progress with this plugin.
[-] The following 3 users Like Laird's post:
  • tim, Obiwan, Typoz

  • View a Printable Version
Forum Jump:


Users browsing this thread: 2 Guest(s)