Extract the first paragraph text from a web page with PHP
Posted February 8th, 2010 in PHP
This post looks at how to extract the first paragraph from an HTML page using PHP's strpos and substr functions to find the location of the first <p> and </p> tags and get the content between them.
Using strpos and substr
Assuming the content to extract the paragraph from is in the variable $html (which may have come from a file, database, template or downloaded from an external website), use the following code to work out the position of the first <p> tag, the first </p> tag after that tag, and then get all the HTML between them including the opening and closing tags:
$start = strpos($html, '<p>'); $end = strpos($html, '</p>', $start); $paragraph = substr($html, $start, $end-$start+4);
Line 1 gets the position of the first opening <p> tag
Line 2 gets the position of the first </p> after the first opening <p>
Line 3 then uses substr to get the HTML. The third parameter is the number of characters to copy and is calculated by subtracting $start from $end and adding on the length of "</p>" so it is included in the extracted HTML.
Converting to plain text
If the extracted paragraph needs to be in plain text rather than HTML, use the following to remove the HTML tags and convert HTML entities into normal plain text:
$paragraph = html_entity_decode(strip_tags($paragraph));
Related posts:
- Add additional HTML code after the first </p> tag in PHP (Sunday, February 8th 2009)
Share or Bookmark
Share or Bookmark this page using the following services. You will need to have an account with the selected service in order to post links or bookmark this page.
Subscribe or Follow
Subscribe via RSS or email, or follow me on Facebook or Twitter below. The RSS icon takes you through to Feedburner where you can select the service or application to use.
