Add a site manually
To add a site, you'll need to pass as a second argument to add()
an array that contains at least 3 elements:
-
host
is the domain name of the URLs you want to match, e.g.example.com
(including subdomains such aswww.example.com
) -
At least one of the following:
-
extract
is a regexp used to extract values from the URL. -
scrape
is an array with at least oneextract
value and other optional values:-
extract
is a regexp used to extract values from the scraped page. -
match
is a regexp used to determine whether to scrape the content of the URL. If it's not specified, every URL is scraped. -
url
is an optional URL used for scraping. It can contain values extracted from the original URL, e.g.http://example.org/scrape/{@id}
. -
header
is a HTTP header, e.g.User-agent: not Mozilla
-
-
-
Exactly one of the following:
-
A
choose
array that contains awhen
array and anotherwise
array.-
when
must contain the following two elements:-
A
test
string that contains an XPath condition. -
An
iframe
orflash
element as specified below.
-
-
otherwise
must contain oneiframe
orflash
element as specified below.
-
-
An
iframe
orflash
element that contains an array of attributes:-
src
contains the URL of the iframe or Flash object. -
width
andheight
are optional and default to 640 × 360. -
Other attributes such as
allowfullscreen
orscrolling
are automatically added where necessary.
-
-
You can specify multiple host
, scheme
, scrape
, header
, extract
, match
or when
values using arrays.
How to configure multiple host
and extract
values
$configurator = new s9e\TextFormatter\Configurator;
$configurator->MediaEmbed->add(
'youtube',
[
'host' => ['youtube.com', 'youtu.be'],
'extract' => [
"!youtube\\.com/watch\\?v=(?'id'[-0-9A-Z_a-z]+)!",
"!youtu\\.be/(?'id'[-0-9A-Z_a-z]+)!"
],
'iframe' => [
'width' => 560,
'height' => 315,
'src' => 'http://www.youtube.com/embed/{@id}'
]
]
);
// Get an instance of the parser and the renderer
extract($configurator->finalize());
$text = 'http://www.youtube.com/watch?v=-cEzsCAzTak';
$xml = $parser->parse($text);
$html = $renderer->render($xml);
echo $html;
<span data-s9e-mediaembed="youtube" style="display:inline-block;width:100%;max-width:560px"><span style="display:block;overflow:hidden;position:relative;padding-bottom:56.25%"><iframe allowfullscreen="" loading="lazy" scrolling="no" src="http://www.youtube.com/embed/-cEzsCAzTak" style="border:0;height:100%;left:0;position:absolute;width:100%"></iframe></span></span>
How to configure the iframe
renderer
$configurator = new s9e\TextFormatter\Configurator;
$configurator->MediaEmbed->add(
'youtube',
[
'host' => 'youtube.com',
'extract' => "!youtube\\.com/watch\\?v=(?'id'[-0-9A-Z_a-z]+)!",
'iframe' => [
'width' => 560,
'height' => 315,
'src' => 'http://www.youtube.com/embed/{@id}'
]
]
);
// Get an instance of the parser and the renderer
extract($configurator->finalize());
$text = '[media]http://www.youtube.com/watch?v=-cEzsCAzTak[/media]';
$xml = $parser->parse($text);
$html = $renderer->render($xml);
echo $html;
<span data-s9e-mediaembed="youtube" style="display:inline-block;width:100%;max-width:560px"><span style="display:block;overflow:hidden;position:relative;padding-bottom:56.25%"><iframe allowfullscreen="" loading="lazy" scrolling="no" src="http://www.youtube.com/embed/-cEzsCAzTak" style="border:0;height:100%;left:0;position:absolute;width:100%"></iframe></span></span>
How to configure the flash
renderer
$configurator = new s9e\TextFormatter\Configurator;
$configurator->MediaEmbed->add(
'dailymotion',
[
'host' => 'dailymotion.com',
'extract' => "!dailymotion\.com/(?:video/|user/[^#]+#video=)(?'id'[A-Za-z0-9]+)!",
'flash' => [
'width' => 560,
'height' => 315,
'src' => 'http://www.dailymotion.com/swf/{@id}'
]
]
);
// Get an instance of the parser and the renderer
extract($configurator->finalize());
$text = '[media]http://www.dailymotion.com/video/x222z1[/media]';
$xml = $parser->parse($text);
$html = $renderer->render($xml);
echo $html;
<span data-s9e-mediaembed="dailymotion" style="display:inline-block;width:100%;max-width:560px"><span style="display:block;overflow:hidden;position:relative;padding-bottom:56.25%"><object data="http://www.dailymotion.com/swf/x222z1" style="height:100%;left:0;position:absolute;width:100%" type="application/x-shockwave-flash" typemustmatch=""><param name="allowfullscreen" value="true"></object></span></span>
How to scrape content
Some media sites don't put all of the necessary data (e.g. the ID of a video) in the URL. In that case, you may have to retrieve it from the page itself.
Note that scraping content is a pretty expensive operation that can take several seconds to complete, in part due to network latency and the responsiveness of the target site.
$configurator = new s9e\TextFormatter\Configurator;
$configurator->MediaEmbed->add(
'slideshare',
[
'host' => 'slideshare.net',
'scrape' => [
// Here we ensure that we don't scrape just every link to http://slideshare.net
'match' => '!slideshare\\.net/[^/]+/\\w!',
// Retrieve the presentationId from the embedded JSON
'extract' => '!"presentationId":(?<id>[0-9]+)!'
],
'iframe' => [
'width' => 427,
'height' => 356,
'src' => 'http://www.slideshare.net/slideshow/embed_code/{@id}'
]
]
);
// Get an instance of the parser and the renderer
extract($configurator->finalize());
$text = 'http://www.slideshare.net/Slideshare/10-million-uploads-our-favorites';
$xml = $parser->parse($text);
$html = $renderer->render($xml);
echo $html;
http://www.slideshare.net/Slideshare/10-million-uploads-our-favorites
Specify a different URL for scraping
If the URL used for scraping is different from the media's URL, you can specify it in the url
element of the scrape
array. You can also use variables in the URL using the familiar syntax {@id}
. Values for those variables come from named captures in previous extract
regexp and from the tag's attributes if applicable.
For example: the dimensions of a Gfycat video are mentionned in the metadata of their page. However, if someone posted a direct link to a Gfycat .gif image such as http://giant.gfycat.com/SereneIllfatedCapybara.gif
, the dimensions would not be available. In the following example, we configure scrape
with a custom URL that is known to include the original image's dimensions.
In addition, we specify that both the width
and height
attributes should be filtered as unsigned integer.
$configurator = new s9e\TextFormatter\Configurator;
$configurator->MediaEmbed->add(
'gfycat',
[
'host' => 'gfycat.com',
'attributes' => [
'height' => ['filterChain' => '#uint'],
'width' => ['filterChain' => '#uint']
],
'extract' => "!gfycat\\.com/(?'id'\\w+)!",
'scrape' => [
'url' => 'http://gfycat.com/{@id}',
'extract' => [
'!property="og:image:height"\s*content="(?<height>\d+)!',
'!property="og:image:width"\s*content="(?<width>\d+)!'
]
],
'iframe' => [
'width' => '{@width}',
'height' => '{@height}',
'src' => '//gfycat.com/iframe/{@id}'
]
]
);
// Get an instance of the parser and the renderer
extract($configurator->finalize());
$text = 'http://giant.gfycat.com/SereneIllfatedCapybara.gif';
$xml = $parser->parse($text);
$html = $renderer->render($xml);
echo $html;
<span data-s9e-mediaembed="gfycat" style="display:inline-block;width:100%;max-width:600px"><span style="display:block;overflow:hidden;position:relative;padding-bottom:56.3333333333333%"><iframe allowfullscreen="" loading="lazy" scrolling="no" src="//gfycat.com/iframe/SereneIllfatedCapybara" style="border:0;height:100%;left:0;position:absolute;width:100%"></iframe></span></span>
Add custom HTTP headers when scraping
Custom HTTP headers can be specified in the scrape
configuration.
$configurator->MediaEmbed->add(
'sitename',
[
'host' => 'example.org',
'scrape' => [
'extract' => '#example\\.org/video/(?<id>\\d+)#',
'header' => [
'Cookie: mycookie=1',
'User-agent: foo-agent'
]
],
'iframe' => ['src' => '//example.org/embed/{@id}']
]
);