As I prepare to incorporate content from external sources into my web application, I am mindful of the potential risks involved. Even though these sources are restricted and trusted, there are still a few concerns that need to be addressed:
The remote sources may
1) fall victim to hacking attempts which could introduce malicious content
2) interfere with global objects in the namespace of my app
3) eventually permit users to input their own remote sources (although user responsibility is emphasized, risk mitigation on my end remains crucial)
To ensure safety, my objective is to eliminate any injected content entirely.
Here is my current strategy:
1) Identify and remove all inline event handlers
str.replace(/(<[^>]+\bon\w+\s*=\s*["']?)/gi,"$1return;"); // This code has not been tested
For example:
<a onclick="doSomethingBad()" ...
would transform into
<a onclick="return;doSomethingBad()" ...
2) Eliminate all instances of the following tags: script, embed, object, form, iframe, or applet
3) Locate all occurrences of the word 'script' within a tag and substitute it with HTML entities for added protection
str.replace(/(<[>+])(script)/gi,toHTMLEntitiesFunc);
This approach would handle cases like this:
<a href="javascript: ..."
4) Lastly, any src or href attribute lacking an http prefix should have the domain name of the external source appended to it
My question: Have I overlooked other crucial measures? Are there specific actions I should definitely take or avoid?
Edit: It is expected that responses will generally fall into two categories.
1) The "Don't do it!" standpoint
While striving for complete security might mean disconnecting the computer altogether, maintaining a balance between usability and safety is essential.
There is little to prevent a user from directly visiting a site and facing exposure. By opening up to allow user input, individuals assume their own risk when submitting content. Since users could just as easily access a URL directly as through my form, I am willing to accept these risks unless they pose a specific threat to my server.
2) The "I'm aware of common exploit techniques and you should consider..." perspective... or Suggestions to prevent different types of attacks... or Recommendations for addressing specific vulnerabilities...
I am particularly interested in the latter type of feedback, unless there are valid arguments as to why my approach poses greater dangers than what users can encounter independently.